How can I serve a "Linearized" PDF file from a servlet?

Garth Somerville

[More question: I am using a servlet to stream out the bytes of a PDF file to a user's browser. No problem there, it's just streaming out a file. The problem is in dealing with what Acrobat refers to as "Linearized" files. When served by a Netscape Enterprise server, these "linearized" files display the first page of the PDF file in the Acrobat Reader while subsequent pages are still being loaded. When displayed using my servlet as the file source, the entire document must download before I can see the first page.]

In order to make the document appear to load more quickly, the Acrobat Reader relies on two features. First, you must of course make sure the PDF file is in the linearized format which the author of the PDF should be able to do. Second, since you are serving the file from a servlet, you must support the HTTP protocol features that Acrobat is depending on.

In short summary, when Acrobat receives the first part of file it can determine that it is in linearized form. This enables it to cache the tables that are present at the beginning of the file and then close the connection, preventing the entire document from downloading all at once. From then on Acrobat fetches each page or object as it is needed from the web server. It does this by using the HTTP byte range headers in requests, and your servlet needs to support these for it to work properly.

The byte range headers are described in the HTTP/1.1 RFC #2616 section 14.35, but a brief description is that in any HTTP request there may be extra headers that indicate the client is requesting only part of the entity, rather than the full entity. These headers indicate a byte range, or offest and length, that the client is asking for. An example is:

GET /example.pdf HTTP/1.1
Range: bytes=0-1023
Here, the client is asking for only the first 1K of the file.

The key to making this work in HTTP 1.1 is in the way the server responds to such requests. If the server does not understand the byte range headers, it will reply in ignorance with the usual "200 OK" and send the entire file. But if it does understand the byte range headers, it replies with a "206 Partial Content" and sends only the requested portion of the file. Looking at the response code is how different client programs are able to display whether a server supports "resumable downloads." If you can program your servlet to support these requests for partial content, I think you'll get the result you want.

[Hmm, sounds a little dubious... Has anyone tried this? -Alex]