Building with Cloudflare Workers
Note: This is an essay about how/why we are using Cloudflare Workers at my current startup Atlas. Skip to part 2 if you want to read only about Workers.
Atlas is an operating system for the food and beverage industry; the system is built with few industry-specific considerations; you can read about what/why/how in my previous post here.
Technically speaking, it’s a system with an isolated database, compute, and shared routing for each tenant who is a restaurant group owner on the platform. Building a system that supports this requirement is not rocket science; we have seen these patterns and similar software a few times already — architecting a SaaS with support for isolated databases and compute instances is pretty straightforward. The complexity came from the shared routing layer. We explored cloud/serverless functions and API-only routers, but it all had a fair share of problems, cloud functions didn’t have persisted state, and apps required a lot of boilerplate code.
Our goal was to build a light routing layer that is fast — in short, we want to create something as fast and reliable as DNS for our network. And then, we came across Cloudflare Workers1, a serverless platform to deploy lightweight javascript applications, which looked like a good fit.
Cloudflare Workers allows you to deploy serverless applications instantly across the globe, in 250 cities2, and scale it automatically meaning user requests will be served from the nearest point of presence. Cloudflare Workers is powered by V8 isolates3 and has zero cold start time4. V8 isolates are lightweight contexts that group variables with the code allowed to mutate them.
Cloudflare announced the Workers platform back in 2017 and has made significant improvements since then.5
In this next part, I’ll focus more on how to piece all these things together to build a complete app using Workers.
Compute (Workers + Workers Unbound)
Workers are lightweight V8 isolates that can process every request; think of it as a cloud function but running on this platform powered by V8, a Javascript engine.
Each Worker will run on a URL that you specified and gets a *.workers.dev domain if you configured it so. A Worker gets invoked when an HTTP request is made on this specified URL. Cloudflare will hijack the call and route the request to Worker for processing. A Worker can listen to these fetch events (HTTP), scheduled events (cron job) and respond with a Response.
Each Worker has their variables and encrypted secrets, and these values are available inside the script as global variables, which will contain their associated values.
A program will only get 10ms of CPU time per request on a standard Worker6, which I think is not ideal for all the tasks. Cloudflare offers Workers Unbound, an extension of Workers, but it is for applications that need longer execution times. It’s a paid feature.
Workers belong to a zone, a domain, making it slightly challenging to separate dev, staging, or production environments. The hack is to host your environments under a different domain.
While Workers instance can handle multiple requests including concurrent requests in a single-threaded event loop there’s no guarantee that any two requests will land in the same instance which is why it is advised not to maintain any state inside the Worker event handler — so you need a way to store things.
Storage
Cloudflare Workers KV powers storage in Cloudflare Workers. Workers KV is a global, low-latency, key-value datastore, and it’s optimized for high read applications.
One caveat, though — updating the KV record from the dashboard is a bit buggy. This whole UX needs to improve quite a lot, and changes may take up to 60 seconds to propagate to all other edge locations. There’s a limited option if you want to have access to KV namespaces.
Durable objects are another way to handle storage in Workers, and it provides low-latency coordination and consistent storage for the Workers. Durable Objects are named instances of a class you define. As a class in object-oriented programming, the class defines the methods and data a Durable Object can access.
Operations
Development and Deployment
Developing on the Workers platform is relatively straightforward; it supports plain Javascript, Webpack builds and supports templates, you can reuse any template. We can preview them on a temporary URL and fully deploy them to the edge using Wrangler CLI7.
You can create multiple environments, bind different KV namespaces, and add secrets using wrangler.toml configuration file. See the Workers configuration notes here to know how deeply you can configure the Workers.
There’s no out-of-the-box solution for testing Workers, and it can get tricky to test KV, secret implementations thoroughly. I found the mini flare8 quite helpful to test the Workers.
Logging and tracing
Logging has forever been a problem in Workers, but recently they enabled support for console.log, which means we can stream all the logs to debug the code. We can stream the logs from Workers on the dashboard or using a CLI.
You can use logging tools like Sentry for any production logging, and there are many useful open-source logging modules to serve this purpose.
Community
I love Cloudflare’s community! It’s always fun to hangout in their Discord chat rooms9 and learn from everyone there. The blog posts, constant stream of examples makes it super easy for me to figure out things.
Apart from it being a good fit for our platform, there are few more things I like about Workers.
Faster deployments
Deploying Workers is insanely fast. Period.
Our CI/CD times have significantly improved and we have less friction working with Workers so far.
Easier onboarding
Because it’s simple and modularized, onboarding new engineers and team members onto apps built using Workers is relatively straightforward.
Continuous improvements from Cloudflare
I started using Workers when it was still in beta, and since then, they have launched a ton of improvements (profiling, logging), and it did make all of it better. Enjoy it when I see continuous, more minor improvements like these.
Fun fact; I hosted this site on Workers!
Closing thoughts
Routing is a primary layer for our infrastructure; relying entirely on Cloudflare might lead us to a single point of failure — we may go down with Cloudflare. There’s been an outage recently on Cloudflare where the dashboard was not accessible for a few minutes, which caused some issues on our end.
But it’s something we can replicate on every platform, but Cloudflare seems like a good fit so far. The hack was to run the same code on a different server, point the A record to this machine, and use it as a fallback server.
As an early adopter of the latest technology, you have to live with few limitations; as I already mentioned, the dashboard experience needs to improve quite a bit — updating KV records and creating new tokens is not intuitive and buggy sometimes.
Our bet on Workers seems to work well so far, it addresses all the limitations we faced with other solutions, and we are using this pattern for all our routing — handling webhooks, external integrations.
Did any of you try Workers yet? Please feel free to reach out to me (email) if you have any questions and ideas, I’ll be happy to take your feedback, help, and discuss it further.
Notes