What is the difference between URL encoding, URL rewriting, HTML escaping, and entity encoding?

Alex Chaffee

URL Encoding is a process of transforming user input to a CGI form so it is fit for travel across the network -- basically, stripping spaces and punctuation and replacing with escape characters. URL Decoding is the reverse process. To perform these operations, call java.net.URLEncoder.encode() and java.net.URLDecoder.decode() (the latter was (finally!) added to JDK 1.2, aka Java 2).

Example: changing "We're #1!" into "We%27re+%231%21"

URL Rewriting is a technique for saving state information on the user's browser between page hits. It's sort of like cookies, only the information gets stored inside the URL, as an additional parameter. The HttpSession API, which is part of the Servlet API, sometimes uses URL Rewriting when cookies are unavailable.

Example: changing <A HREF="nextpage.html"> into
<A HREF="nextpage.html;$sessionid$=DSJFSDKFSLDFEEKOE"> (or whatever the actual syntax is; I forget offhand)

(Unfortunately, the method in the Servlet API for doing URL rewriting for session management is called encodeURL(). Sigh...)

There's also a feature of the Apache web server called URL Rewriting; it is enabled by the mod_rewrite module. It rewrites URLs on their way in to the server, allowing you to do things like automatically add a trailing slash to a directory name, or to map old file names to new file names. This has nothing to do with servlets. For more information, see the Apache FAQ (http://www.apache.org/docs/misc/FAQ.html#rewrite-more-config) .