Toilet Blog Engine, Version 8
Since I've been stuck at home for a few months, I've been updating this blog. There's been some major improvements, because the whole stack has been upgraded: the OS (Xubuntu 16.04 to Xubuntu 20.04), Postgres (9.5 to 12), JVM (8 to 11), and web server (Payara 5.191 to 5.2020.3). (There was one Payara version that enabled TLS 1.3, but it's bugged. Maybe I'll try next time!) With PostgreSQL 12, I finally have access to the websearch_to_tsquery
function for searching. You can use quotes to force include something, and hyphens to exclude something. However, naively connecting trigrams to it like how I did destroys the cool functionality, so I dropped it. I've built a search suggestion feature to cover for it; try it out.
You might have noticed that I have a lot of gaming articles. Many are installments in a series, and I would like to show links to my other articles in the series if available. For example, if you're reading "Game 3: Showdown", it would be cool to have prominent links to my "Game 1: Apprehension Rising", "Game 2: The Gamining", and "Game: Spinoff" articles. So I've stolen an idea from my day job, and implemented a "you might also like" feature at the bottom of every article. A regex tries to pull out the most pertinent words of the article's title (everything up to the first comma or colon), and searches with it. (I've included the exact term in an HTML comment near the suggestions.) Because titles weigh heavy in my search index, other articles with that term in the title rank high. Sometimes it suggests an article where I mention or compare that topic, which helps break search bubbles. If there's less than 6 suggestions, it fills the rest with the most recent articles of the article's section.
My RSS system has been humming for a long time. I write a class to create XML, and register it. In a perfect environment, I wanted to add an annotation to a class, scan for that annotation, and add it automatically. That's not easily possible, so I've been able to work around that in a semi-clean way, and that's worked for years. I've decided to remove that annotation. I've also wanted to dynamically create feeds, but the interface only supports one URL per feed. Once upon a time, I intended that URL parameters would be used for this, but that's not very RESTful. I've changed that, so now I have feeds for each category, and separate comment feeds on each article. This might have come too late, since social media has eaten RSS. (Some people still hate Google for discontinuing Reader.) But like fashion, some technologies come back in style after a generation, so this might be useful in the 2030s.
On a more aesthetic note, I've changed my monospace font to Nouveau IBM. It's reminiscent of the font used in DOS and BIOS setup screens from the 90s and 2000s. It looks like the font that Twentieth Century uses! I've set the color to monochrome amber to pile on the nostalgia, even though I don't have any nostalgia for monochrome screens. Not to be left out, I've changed of all my terminal windows to use the same.
Since I started reading Hacker News for my podcasts, I've noticed some new image formats coming. JPEG XL has potential, but is still under heavy development. AVIF is more mature and has more implementations. (Yes, even Paint!) I'm very impressed with my game screenshots in AVIF, because AVIF seems about twice as efficient as JPEG on a per-byte basis. My posts now have <picture>
elements with AVIF and JPEG, so this blog is good to go when browsers flip the AVIF switch. Because of <picture>
, I no longer do HEAD requests to check for higher resolution images. Firefox has had a flag for a while (what I've been using to test), and will release for good soon, along with Chrome.
As a fan of fast computers, I constantly look for ways to make this blog faster. It was pretty obvious that I couldn't pull stuff from the database and put it on a page any faster. I had to look elsewhere. What if I didn't mess with the database at all? What if I cache the entire page and return it, instead of creating it again? I've considered a reverse proxy in the past, like Nginx or Varnish, but more servers means more problems.
I investigated some caching features. Through the original Sun Java App Server, there is a JSP caching tag. It seemed to work well, but there is no easy way to invalidate the entire cache, or parts of it. I could use in-memory databases, but that usually involves a cross-process call. If I do that, I might as well keep using Posgres.
JSR 107 is a Java standard that defines a caching interface. I spent about a week's worth of evenings trying to understand it enough to shoddily implement it with a HashMap
. Even now, it doesn't quite work right, as it still runs as a singleton EJB. A filter uses cache headers to see if something can be cached and captures it, and another filter checks and retrieves cached pages. It works so well that it caches RSS feeds, images, CSS, and javascript files. Over the years, I've seen my blog create pages in 50 milliseconds down to around 20 or 15, as measured by the footer. Now, my browser tools report that my server can spit it out in 1.
My comment section requires a session, and each comment form is unique to you, because that stops a lot of bots and spam. If that's cached for everyone, it won't work right. Because I put the comment sections on a separate page and have them in an <iframe>
, the page (blog post) loads and exists separately from its comments. I've considered having a separate cache for comments on each session, but that's not needed right now.
I've also been improving the out of the box experience, i.e. what happens when you run it for the first time. This blog needs some database tables and configurations that are separate from the web server. I've simplified it so that the blog will create and load its own stuff automatically. New installs will show a helpful setup page where you can customize things (like passwords) before going for real. I also built "plunge": a feature to upload a backup and skip this mess, if this is not your first flush the first time you've set up this blog.
I decided that the default style needs to be very different than my normal one. Since my current blog design was trending towards synthwave, I wanted a vaporwave aesthetic for a default. I hacked one together, taking inspiration from vaporwave cover art and Windows 93 95. Once finished, I wondered if I could run both the normal style and the vaporwave one side-by-side, as if it were an Easter egg. Since I have a lot of i18n infrastructure, and vaporwave is sorta associated with Japan, I made a "Japanese" version of my site to access it. It's a good test of the feature, and after fixing bugs (turns out it never worked as intended), it works! Don't you love these buttons? They don't make buttons like these anymore! I don't care what error appears; I want to click them!
After matching blog posts about improving the site with git code commits, I've finally decided to number the versions. Instead of guessing which version number I'm on, I've decided this is version 8. Here's some highlights of previous versions, and approximate year:
- 1.0 (~2009) happened when I left Neumont (my college).
- 1.1 (~2010) added data transfer objects.
- 2.0 (2011) added ETags, backups, and request tokens (to prevent CSRF).
- 3.0 (~2012) added internationalization features, and had no DTOs (from 1.1) because they are stupid for a blog.
- 4.0 (2013) added Markdown authoring.
- 4.5 (~2015) added HTTPS features and explicit database indexes.
- 5.0 (2016) implemented article summaries and search.
- 6.0 (~2018) added AMP and multithreaded processes.
- 7.0 (2019) implemented search trigrams and a health page.
- 7.5 (early 2020) had a OOTB prototype.
- 8.0 (2020) is the one you're reading about right now; the one with page caching.
Don't forget that you can steal all this code from the GitHub repository. You probably won't, but I still dare you.