HTTP ETags

I recently redid my site here to better use client side caching.

First (and obvious) was to use modified dates. I needed to slightly redo my database to store when the information in a post was changed. For instance, I can edit posts, anyone can comment, and I can remove comments. Any of that changes the post. It turns out that I was not accurately capturing modified date information.

Second (and cooler, in my opinion) was to implement ETags. It is a small part of the HTTP specification (wiki), but they essentially act like fingerprints on what is being sent. Unlike modified dates, the Java Servlet specification and interfaces do not natively support ETags, so I needed to implement it myself.

Unfortunately, the spec leaves out what ETags should be, be it a version number, checksum, hash, or whatever. It does not even say how long it can be. I decided on doing a SHA-512 hash over the relevant information on a page (post, title, modified and posted dates, every comment and associated names and dates), then for good measure, convert it to Base-64.

So now I have an extra 90 byte header being shuffled back and forth. I've always used compression on my responses, so an extra 90 bytes being transmitted is fine. Plus if it means not sending what amounts to the same thing again, it's worth it.

Because CPU time is much cheaper than network time, right?

In ever continual improvement, I have changed the ETag hash algorithim I am using to SHA-256 from SHA-512, for bandwidth and performance improvements. Since I am dealing with mere blog posts and not a library's worth of information (yet), I feel the trade off is worth it. SHA-256 is a faster hashing algorithm, and header size has been reduced from 90 bytes to about 50 bytes. I will evaluate SHA 3 when NIST has chosen a winner and code becomes available.

Posted by the Andrew Bailey.

No new comments may be posted for this article at this time.