As you know, The Polyglot Developer is a static generated website that is built with Hugo. Given the nature of static generated websites, they are generally much faster than the CMS alternatives, but just because they’re fast, doesn’t mean they pass all of Google’s tests by default.
In an ongoing effort to try to satisfy Google PageSpeed Insights and Lighthouse when it comes to search engine optimization (SEO) and other best practices, I was lead to the progressive web application (PWA) test. There are many factors that determine if something is a PWA, but one of those resides in the use of service workers. In case you’re unfamiliar, service workers accomplish many things, with the most common of those things being caching.
Implementing service workers in an application is not necessarily the most complicated task, but as your applications evolve, things might become more chaotic. This is where Workbox comes in. With Workbox, you can use very clean APIs to pre-cache your static site resources as well as cache resources at runtime. We’re going to see how to use Workbox to implement service workers for caching Hugo content and other resources such as images, fonts, and scripts.
Before getting too invested, I wanted to point out that the concepts of this tutorial can be applied to other static site generators such as Jekyll and not just Hugo. Also, we’re going to have a heavy dependency on Gulp and post-processing build scripts. If you haven’t already read my tutorial, Getting Familiar with Gulp for Workflow Automation, I recommend you do. I use Gulp for everything on The Polyglot Developer, and it plays a part when it comes to pre-caching and the configuration of Workbox.
Let’s assume you’ve already got a Hugo project and you’re ready to add a service worker with Workbox. The first step is to install the appropriate packages as part of our post-build process.
At the root of your Hugo project, execute the following:
npm init -y
npm install gulp --save-dev
npm install workbox-build --save-dev
npm install gulp-clean --save-dev
npm install gulp-shell --save-dev
The above commands will create a new package.json file at the root of your project and install Gulp as well as Workbox locally in the project. We’re also installing a package for cleaning our project and using shell commands, which will aid in the build process. As you can probably guess, the Node Package Manager (NPM) is a requirement for the above commands.
At the root of your project, create a gulpfile.js file and include the following boilerplate JavaScript code:
const gulp = require("gulp");
const clean = require("gulp-clean");
const shell = require("gulp-shell");
const workbox = require("workbox-build");
gulp.task("clean", function () { });
gulp.task("hugo-build", shell.task(["hugo"]));
gulp.task("generate-service-worker", () => { });
gulp.task("build", gulp.series("clean", "hugo-build", "generate-service-worker"));
In the above code we’ve imported the dependencies that were previously downloaded and created four tasks. Technically, the only truly important task is the generate-service-worker
task, but I thought you might find value in the others as well. As far as the process goes, when we run our build
task, the project is first cleaned, then Hugo builds fresh HTML output, and then a service worker is created with pre-caching based on that HTML output.
So starting with the first two tasks:
gulp.task("clean", function () {
return gulp.src("public", { read: false, allowEmpty: true })
.pipe(clean());
});
gulp.task("hugo-build", shell.task(["hugo"]));
When we build with Hugo, we’re anticipating a new public directory be created. The clean
task just removes that directory. The hugo-build
task will just execute a hugo
command which will build that public directory.
So let’s look at the code that matters. The code for creating our service worker should look something like this:
gulp.task("generate-service-worker", () => {
return workbox.generateSW({
cacheId: "thepolyglotdeveloper",
globDirectory: "./public",
globPatterns: [
"**/*.{css,js,eot,ttf,woff,woff2,otf}"
],
swDest: "./public/sw.js",
modifyUrlPrefix: {
"": "/"
},
clientsClaim: true,
skipWaiting: true,
ignoreUrlParametersMatching: [/./],
offlineGoogleAnalytics: true,
maximumFileSizeToCacheInBytes: 50 * 1024 * 1024,
runtimeCaching: [
{
urlPattern: /(?:\/)$/,
handler: "staleWhileRevalidate",
options: {
cacheName: "html",
expiration: {
maxAgeSeconds: 60 * 60 * 24 * 7,
},
},
},
{
urlPattern: /\.(?:png|jpg|jpeg|gif|bmp|webp|svg|ico)$/,
handler: "cacheFirst",
options: {
cacheName: "images",
expiration: {
maxEntries: 1000,
maxAgeSeconds: 60 * 60 * 24 * 365,
},
},
},
{
urlPattern: /\.(?:mp3|wav|m4a)$/,
handler: "cacheFirst",
options: {
cacheName: "audio",
expiration: {
maxEntries: 1000,
maxAgeSeconds: 60 * 60 * 24 * 365,
},
},
},
{
urlPattern: /\.(?:m4v|mpg|avi)$/,
handler: "cacheFirst",
options: {
cacheName: "videos",
expiration: {
maxEntries: 1000,
maxAgeSeconds: 60 * 60 * 24 * 365,
},
},
}
],
});
});
There is a lot going on in the above generate-service-worker
task, so we’re going to break it down.
Using the globDirectory
property we can specify our working path for pre-caching and similar. This directory should be the static generated HTML and resources that was created by Hugo. When it comes to pre-caching, we can specify the files we want in the globPatterns
property.
globPatterns: [
"**/*.{css,js,eot,ttf,woff,woff2,otf}"
],
In the above example, we are saying that we want to pre-cache all CSS, JavaScript and font files located anywhere within the public directory. Now you might be wondering, why we’re not pre-caching our HTML, images, and other media. To answer that, you could, but if your blog is anything like The Polyglot Developer, you’re going to eat up a lot of space on your users computer as well as take a lot of processing and bandwidth up front. The Polyglot Developer is more than 500MB in size, so to pre-cache that on everyone’s computer or mobile device would probably not be wise. Only pre-cache what is absolutely necessary.
Using the swDest
property you can specify the output of the service worker. Essentially this Gulp task will spit out a JavaScript file to be referenced in a manifest file. This output file contains our service worker logic. It should exist at the root level of your output directory.
Based on how Hugo works, we also need to specify a prefix in our cache path. If we don’t do this, we’ll be caching files, but they will never be used.
modifyUrlPrefix: {
"": "/"
},
The above modifyUrlPrefix
will prefix a slash to every file path. Feel free to modify that prefix to meet the needs of your application. This brings us to the following lines:
clientsClaim: true,
skipWaiting: true,
ignoreUrlParametersMatching: [/./],
offlineGoogleAnalytics: true,
maximumFileSizeToCacheInBytes: 50 * 1024 * 1024,
Since Hugo rebuilds everything and we’re not depending on single page application (SPA) functionality, we can force the service worker to immediately activate upon installation using the clientClaims
and skipWaiting
properties. We can also ignore all URL query parameter matching in case some of your URLs have UTM information or similar. Basically I don’t want it to strip out my affiliate tags and similar.
Some sites host large files, so we aren’t going to cache files that are larger than 50MB in size, which is already very generous.
This brings us to runtime caching which is different than pre-caching. Instead of caching files upon first load of the website, we’re going to cache files as they are used. We can specify a different caching strategy for different types of files if necessary. For example:
{
urlPattern: /\.(?:png|jpg|jpeg|gif|bmp|webp|svg|ico)$/,
handler: "cacheFirst",
options: {
cacheName: "images",
expiration: {
maxEntries: 1000,
maxAgeSeconds: 60 * 60 * 24 * 365,
},
},
},
The above strategy will cache images and expire them after a year. The images will also start to expire if there are more than 1000 in the cache. Notice the cacheFirst
type used. There are numerous options as outlined in the Workbox documentation, but cacheFirst
says that the cache will be tried first. If the image doesn’t appear in cache, then it will be requested from the network. This is good for cache items that don’t change frequently or at all.
Notice that we have another runtime caching policy that doesn’t use cacheFirst
as the type:
{
urlPattern: /(?:\/)$/,
handler: "staleWhileRevalidate",
options: {
cacheName: "html",
expiration: {
maxAgeSeconds: 60 * 60 * 24 * 7,
},
},
},
The above regular expression doesn’t reference a file extension. This is because we want to cache HTML and Hugo gives us pretty links rather than links with .html in the URL. We’re using the staleWhileRevalidate
strategy which requests resources from the cache and network in parallel. This is where it is easy to go wrong. When you rebuild your Hugo site, you need the cache to be reset, otherwise the article list won’t update and your users will never be able to see new content. With the staleWhileRevalidate
option, the user might receive the cache on the first request, but at least the second request will have the new content. If there is no network connection, the cache will continue to be used, unless it has expired.
So what does the output service worker look like when we run the build
task?
The public/sw.js file might look something like this:
importScripts("https://storage.googleapis.com/workbox-cdn/releases/3.6.3/workbox-sw.js");
workbox.core.setCacheNameDetails({prefix: "thepolyglotdeveloper"});
workbox.skipWaiting();
workbox.clientsClaim();
self.__precacheManifest = [
{
"url": "/css/bundle.min.e3e69c65753e6ea4b292f69d3fb284c04c7eed7af4545c6a7522788761fb79c8.css",
"revision": "4dd28b66ba7b7b3b3fed248793680f61"
},
{
"url": "/fonts/fontawesome-webfont.eot",
"revision": "674f50d287a8c48dc19ba404d20fe713"
},
{
"url": "/fonts/FontAwesome.otf",
"revision": "0d2717cd5d853e5c765ca032dfd41a4d"
},
{
"url": "/js/bundle.min.b83ac6a78ebe029e00da398d8bd103a7897681fcd3e16d549eabce72e41d4dbf.js",
"revision": "ca83a464db1f84698cc96e33ead4f931"
}
].concat(self.__precacheManifest || []);
workbox.precaching.suppressWarnings();
workbox.precaching.precacheAndRoute(self.__precacheManifest, {
"ignoreUrlParametersMatching": [/./]
});
workbox.routing.registerRoute(/(?:\/)$/, workbox.strategies.staleWhileRevalidate({ "cacheName":"html", plugins: [new workbox.expiration.Plugin({"maxAgeSeconds":604800,"purgeOnQuotaError":false})] }), 'GET');
workbox.routing.registerRoute(/\.(?:png|jpg|jpeg|gif|bmp|webp|svg|ico)$/, workbox.strategies.cacheFirst({ "cacheName":"images", plugins: [new workbox.expiration.Plugin({"maxEntries":1000,"maxAgeSeconds":31536000,"purgeOnQuotaError":false})] }), 'GET');
workbox.routing.registerRoute(/\.(?:mp3|wav|m4a)$/, workbox.strategies.cacheFirst({ "cacheName":"audio", plugins: [new workbox.expiration.Plugin({"maxEntries":1000,"maxAgeSeconds":31536000,"purgeOnQuotaError":false})] }), 'GET');
workbox.routing.registerRoute(/\.(?:m4v|mpg|avi)$/, workbox.strategies.cacheFirst({ "cacheName":"videos", plugins: [new workbox.expiration.Plugin({"maxEntries":1000,"maxAgeSeconds":31536000,"purgeOnQuotaError":false})] }), 'GET');
workbox.googleAnalytics.initialize({});
Having a sw.js file at the root of your public directory isn’t enough. You do need to tell your pages to use this file. In your theme, before the closing <body>
tag, add the following:
<script>
if("serviceWorker" in navigator) {
window.addEventListener("load", () => {
navigator.serviceWorker.register("/sw.js").then(swReg => {}).catch(err => {
console.error('Service Worker Error', err);
});
});
}
</script>
Remember, the sw.js file must be at the root of your public directory. If you have a custom configuration, make sure things are reflected in the gulpfile.js file and in the script for initializing the service worker. To be a fully compliant progressive web application (PWA), you’ll likely need to work on a manifest.json file at the root of your public directory, but at least the service worker logic is in place.
You just saw how to use Workbox in your Hugo project for generating service workers. Like previously mentioned, one of the functionalities (not all) has to do with caching and becoming a progressive web application (PWA). It is a good idea to use service workers in your Hugo project because it is very easy to do and will make you rank better when it comes to the Google testing and crawling tools.
Some things I wish I had known getting into service workers with Hugo:
If you are using service workers with a static generated website and you are doing something differently, let me know about it in the comments.