Lizt: An offline-ready full-stack collaborative web app in 200 lines of Clojure(Script)
Intro
This is the first installment in a series of blog posts - accompanied by planned screencasts - that introduces a proof-of-concept solution for sharing data across multiple devices, in a surprisingly tiny amount of code.
GitHub: https://github.com/indolamine/lizt
The to-do examples
You know TodoMVC.
It’s the de-facto demo app for web frontend and some full-stack technologies. It’s using a “business domain” that is easy to grasp (“todo list”), and is pretty helpful in comparing different stacks’ syntaxes. But I have to admit that I’ve always had some weird feelings towards it.
You see this over and over again: for demonstrating programming languages, libraries, stacks, it’s pretty common to see the “to-do list” being used as an example. At first sight it looks like it’s the web-adapted version of a “Hello, World!”. But this notion is misleading.
In our time and age “todo lists” are anything but simple. Any viable implementation will need to take multi-player and multi-device scenarios into consideration. Either it’s you working on some list-shaped thing on your computer and on your various mobile devices with the assumption of changes automagically propagating everywhere; or you actually work on lists in collaboration with other people.
Such is the case of going out shopping with a shared shopping list. Your partner grabs some bread, while you check off milk - and you both expect that you’ll see what the other did, reliably and quickly.
So, the actual complexity of the problem is one of data management and distribution: how to support concurrent activity on the same data set. The problem is fundamentally a distributed database problem, which is very far from trivial to solve well.
Demo applications (with the rare exceptions) typically fall back to providing or relying on REST-ish endpoints for creating, deleting, modifying items. This is obviously not enough, the requirement to collaborate (and ideally to work offline) makes it neccesary to implement some extra infrastructure around the notions of “cache”, “sync”, “conflict resolution” and other exotic stuff.
The as-is
But, this is 2023, right, so there has to be some kind of simple and easy solution to deal with these kinds of scenarios.
Well, yes and no.
Try creating a new Google Docs document online and shut down your network connection…
Try creating a Confluence page and go offline…
Try to change your Figma designs on a plane…
All of these are frustrating experiences. By default most “edit stuff on the web” applications will take the easy way out - editing will simply be disabled while you’re offline. Google Docs, which first made collaboration-on-the-web a reality for the masses still requires a browser plug-in for proper-ish offline support.
So, the problem is not trivial, but there is hope.
Approaches
Supporting “collaborative but also offline” features in applications seems to be done in two ways:
- through implementing and maintaining application-specific infrastructure for offline support and syncing with a central service or another application. On the web this usually goes hand-in-hand with some kind of WebSockets or HTTP calls + Server Sent Events framework to enable 2-way communication. This approach is widely used; just observe the network traffic of your applications to see examples.
- through the use of data structures that themselves enable disconnected autonomous use and a structured way of propagating changes and resolving conflicts. Git is probably the most prominent example. But, Git relies on some manuality, e.g in resolving merge conflicts.
While both approaches are viable, the one I find most attractive is something that could be seen as an evolution of the Git approach. This approach is the CRDT (Conflict-free Replicated Data-types) one.
CRDTs
The idea of CRDTs (which is a family of data type definitions with different capabilities) in a nutshell is the following:
- you have a way of representing data and the data’s history in a single, standard data structure that any application instance can maintain autonomously
- plus you have a well-defined way of consolidating two diverged (concurrently changed) versions of this data
- now, if you have two independently evolved versions of the data and they meet, they will resolve / merge to exactly one new version, without any ambiguity
The above means, that we pretty much de-complected the whole “sync” process from any kind of “data transfer” process. Now, if you give me your version of some data that I’ve also evolved myself, I can merge the two, as well as you. We’ll end up having the same data at the end. It’s totally irrelevant from this aspect when and how your changes reached me and my changes reach you. We might put them on a floppy disk and send it over with a pigeon or send it over zipped on a WebSocket channel, that won’t change the data management issue at all. Magic.
While you’re here, make sure to check out all the great stuff that Martin Kleppman did, not just for CRDT stuff.
CRDTs are not new, and some applications already use them extensively. But how about Clojure?
The landscape in Clojure
If you want to leverage the power of CRDTs in Clojure(Script), you have a range of options to consider
- go with one of the JavaScript libraries out there, such as Y.js . These can play nicely with ClojureScript, but you’ll need to do all the interop dance, translating between JS-ish data structures and Clojure ones. Painful. Also, if you’d like to include a “backend” or a central service in the sync processes in some transparent way (as in: the backend knows what data it help syncing) you are stuck with using JavaScript on the backend as well - practically ClojureScript on Node.js, which lands you on single-threaded async-everything land, which is not for everyone
- go with a Java implementation, which is cool on your backend, but good luck with making JavaScript in the browser play nice (or at all) with it
- go with the “canonical” Automerge that offers full-stack with JavaScript and Rust, meaning fancy interop on both sides if you want to stick with Clojure
- choose a Clojure implementation that will lead to a nice, full-stack pure Clojure(Script) solution. Yay. Of course that’s what I’d like to do
There are some libraries in Clojure that implement the CRDT approach:
- schism by Alex Redington
- cause by Chris Smothers
- evidentsystems/converge by Bobby Calderwood
Give all of them a spin, they all seem really nice, but for the purposes of this demo app I’ve chosen to go with Converge. My two main reasons were that I liked the atom-like ergonomics of the API; and I was blown away by a presentation of the author, Bobby, given at Strange Loop years ago, so I kind of trusted him to get this right.
Lizt
So, the app that I’ll use for demonstration is a really small thing. There is a tiny backend (in Clojure) and a tiny front-end app (in ClojureScript), in the same codebase. Both the backend and the front-end app’s instances will maintain state, branched off of a piece of original state hard-coded in the backend. They will do simple manipulations on the data, and whenever they can, they will present their changes to each other. Clients will send changes to the backend; in response, the backend will send all changes that it knows about to the clients. If some or all clients go offline, that’s fine; once they come back, the states will get automagically consolidated again. In addition to this, it’s not just the clients that can make changes to the state; the data is also transparent to the backend, so the backend code can participate in the loop as a first-class citizen, not only as a dumb relay.
The feature set of the app is extremely limited:
- you have one single list of items
- the list has a string title
- the items have a string title and a unique id
- the front-ends will display the title and the items; they’ll add some coloring to the items (to make it easier to grasp what’s going on with the sync)
- the front-ends have a button that will generate a new random item and add it to the list (client-side)
- the front-ends can remove items from the list
That’s all. You’ll notice that this PoC really focuses on the distributed-data infrastructure and nothing else. You won’t be able to change the items, there is no feature to type your own items, etc. All of those are taken away so that the focus remains on the data and data exchange as opposed to some boring UI mechanics.
Even some aspects of the sync are implemented in a very naive way; clients will try and do a sync in a timed manner, every 2 seconds. Again, the what of syncing matters way more than the how and on what communication channel for the purpose of this PoC.
The tech stack that I’ve used looks like this:
- Pedestal (on the JVM) as the server-side web foundation
- Reagent on the browser-side
- shadow-cljs as ClojureScript tooling
- transit for message encoding for the backend requests and responses, both clj and cljs
- ajax-cljs for managing backend communications from the browser
- and of course converge for the CRDT machinery
Grab the code from the repo at https://github.com/indolamine/lizt , play around with it, and watch the screencast once it’s available.
What’s next
In the next installments, we’ll:
- talk about CRDTs a bit more in-depth and play around with Converge
- build the front-end in Reagent and talk about some choices made
- build the back-end on top of Pedestal
- observe the behavior of the complete app and talk about the trade-offs involved
Cover photo
Photo by Natalia Y. on Unsplash
Lizt - Part 2: CRDT in Clojure with Converge »