Bigcanvas: Concurrency

Core to the idea of Bigcanvas is that it’s a shared space, where everyone can draw at the same time. Much as it would on a real canvas, this means people can interfere with each other. Properly handling this and making sure that everybody’s brush strokes made it onto the canvas turned out to be a fairly tricky problem.

In the old Bigcanvas, I was only using black on white line drawings, so the server would just concatenate your brush strokes to those that were already there, and send them on to all clients viewing that tile. In the new Bigcanvas, however, the server doesn’t know about brush strokes, it only knows about PNG files, so there is no such thing as concatenation.

Brute force

The simplest approach is to just make the client send an updated tile to the server in the form of a PNG image. This works fine as long as there is only one person drawing to that tile at the same time, but what if there is more than one? In that case, one person’s changes will overwrite the tile just uploaded by the other, effectively undoing their changes. Very frustrating! This is the classical readers-writers problem.

The simplest solution to this would be locking. When you start drawing to a tile, you tell the server to lock it, and the server forbids anyone else from drawing to that tile concurrently. This alleviates the problem, but doesn’t make it go away altogether: what if two people try to claim the lock at the same time? The server will grant it to only one of them, but by then, the user is already seeing their (about to be denied) brush strokes on their screen. Also, how do you tell the user that they can’t draw because somebody else is drawing, without sounding like a primitive app from the previous century?

Incremental updates

So let’s think back to the solution used by the old Bigcanvas, which just concatenated strokes in whatever order they arrived on the server. Can we “concatenate” PNG files? Of course! We just send a “delta” in the form of a largely transparent image, which only contains the brush stroke that was just drawn. The server overlays this on top of the tile image from the database, and stores the result back. As long as the database allows us to do this atomically, we are fine. Redis’s limited transaction support is just enough to make this work, and indeed, this is how I initially implemented it.

However, this leads to complications on the client. Besides the tile’s base image, call it B, we’ll have to separately keep track of the new delta N that has not been sent to the server yet; on the screen, we show the two on top of each other as B + N (note that this + is not commutative). At some point, we send N out, and the simplest thing to do is just to apply it to the base image at the same time, B += N, optimistically assuming that the network will remain up and the server will accept it.

However, what if we send our delta and apply it locally, but somebody else drew to the same tile in the meantime? The server now has an interfering delta I on top of B and will obtain B + I + N. How will our client be notified? To solve this, we can use versioning. Every time the server applies a delta, it also increments the tile’s version number by 1. When a client sends a delta, it also sends the version number of the base tile PNG that the delta is being applied to. The server compares that number to the one in storage. If they match, good, the client was up to date. If they don’t match, it means the user drew on top of something that was outdated, and we need to inform them. In that case, the server applies the delta as usual, but sends the new B along in its response. Upon receiving the new base tile from the server, the client replaces its currently known base tile with the new one. Problem solved!

One more problem to think about. What if there is a network glitch? If the client sends a delta but doesn’t get a response from the server, it has no way of resending the delta, since it was merged into the base image B.

The solution to that is to keep not one in-progress delta on top of the base tile, but two. One, call it P for ‘pending’, for changes that have been sent but not yet acknowledged by the server, and N for new changes that haven’t yet been sent. At any time, we’ll be showing B + P + N on the screen. We’ll never have more than one pending delta; we simply don’t send a new one until the previous one has been acknowledged (and either merged into B or supplanted by a piggybacked update to B). This has the pleasant side effect of automatic throttling in case the server gets slow.

In case the network goes down, sending P will fail. Upon noticing that failure, the client merges N into P just so any new changes are included, and retries. You can keep drawing forever, and all your changes will just be batched up in P, ready for when the network comes back.

Full updates

The above approach may sound good, but it’s still not bulletproof. It is possible, though unlikely, that the delta was actually applied on the server side, but the connection was dropped before the client received the acknowledgement. In that case, the retry will send the delta a second time, and the server will be none the wiser and apply it a second time. We could work around this by storing a list of identifiers of recently applied deltas on the server side, but I think that is going too far at the moment for this rare edge case.

The root of this last problem is actually that the server doesn’t store anything besides the current image and its version number. It doesn’t know what exactly the client is expecting it to do. Therefore, the clients have to do all the work. Maybe, then, we shouldn’t rely on the server to know whether to apply a delta, but give the client up to date knowledge? Let’s consider how this would work.

Instead of sending deltas, the client sends the full tile. Obviously, this means the user’s changes must have been applied to an up to date base tile, otherwise we’d be overwriting someone else’s changes. If the server detects that there is a version number mismatch, it refuses the update, and piggybacks the current tile on its response. The client then has a chance to re-apply its delta and try again.

The benefit is that this approach is bulletproof. We only accept the update to version n+1 if we know for sure that it was based upon version n, not on some previous version m < n.

Another benefit is that the server has less work to do: it can just store the PNG blob in the database, and doesn’t need to do any decoding or alpha blending.

The drawback is an increase in bandwidth usage. Deltas will typically be monochrome and largely transparent, so they compress well. Full tile images, probably not so much, especially as drawings get more complex. And in case of concurrent writes, the situation gets even worse, because the client now has to upload the entire tile twice.

Another drawback is starvation. If two people are drawing to the same tile, but one is on a fast connection and the other on a slow one, updates from the person on the slow connection might forever be pre-empted by those of the other person. Since someone will eventually stop drawing, I don’t think it’d be much of an issue.

Conclusion

So now we have two algorithms: one that burns CPU on the server to save bandwidth, one that spends bandwidth but saves server CPU. Which to choose? There are essentially two factors to take into account: user experience, and cost.

User experience should be the same for both algorithms, except for latency. Remember that the delta algorithm requires less upstream bandwidth, but is this really necessary? I sampled a couple of representative-looking 256×256 PNG files on the web and concluded that a typical tile would be about 50 kB. Assuming an update rate of once per second, mostly to one tile, that amounts to 500 kbit/s of upstream bandwidth. Clearly no problem even on a slow broadband connection, and even an average mobile connection can pull 5 Mbit/s nowadays. So saving bandwidth is not going to make things appreciably faster for most users.

The cost argument turns out to be the decisive one. Looking at two large cloud hosting providers, Amazon EC2 and Google Compute Engine, both offer unlimited ingress traffic for free. (Why? No idea.) That means CPU becomes the limiting factor here. How much is the CPU difference? I wrote a little benchmark to find out, and ran it on my 7-year-old 32-bit laptop. The results might not be entirely representative of more modern systems, but it should be good enough for now.

Storing a 50 kB PNG file directly into Redis takes about 0.7 ms, whereas loading a 50 kB PNG from Redis, decoding it, applying a delta, encoding it, and storing it back takes a whopping 240 ms on the same machine. That is a big difference. In particular, it means that a single CPU core will be able to support only four people drawing at the same time. It’ll be hard to make a profit that way…

Why is it so slow? Profiling to the rescue! The Go runtime comes with a built-in sampling profiler, so it’s really easy to get good data on where your time is spent. (You can even enable it in production servers to get live data!) As it turns out, decoding the stored tile and delta PNG takes 5% each, drawing the delta onto the PNG takes 37%, and encoding the result takes 38%. The remaining 15% is spent on a conversion step that I could optimize away, but the rest of the work would be here to stay. Looks like the stock image/png and image/draw packages could do with some optimization.

So, in the end, it’s a no-brainer. Be wasteful with bandwidth, and offload the hard work to the client (which is doing it anyway, just to display real-time results). Premature optimization may be the root of all evil, but this is actually more like designing for performance, and I’m glad I thought this through beyond my first implementation.