How can I find the size of a remote HTML document?

Tim Rohaly

You can use the getContentLength() method of URLConnection. The example below shows you how. Be aware that this will only work if the server accurately reports the length of the file as part of the HTTP header that is returned when you request that URL. In many cases, dynamically generated pages do not report this content length because it is not known a priori. In those cases the only way to find out the length is to download the entire page and count bytes on-the-fly.

Also remember that the content length reported is just for the base document - it does not include the length of any images that are referenced. These lengths need to be computed separately.

import java.net.*;
import java.io.*;

public class Size {

    public static void main(String[] args) {
        try {
            URL url = new URL("http://www.octanecreative.com/ducttape/");
            URLConnection connection = url.openConnection();
            System.out.println("Length = " + connection.getContentLength());
        catch (MalformedURLException e) {
        catch (IOException e) {
