September 6, 2009
Elko II: Against Statelessness (or, Everything Old Is New Again)
Preface: This is second of three posts on Elko, a server platform for sessionful, stateful web applications that I’m releasing this week as open source software. Part I, posted yesterday, presented the business backstory for Elko. This post presents the technical backstory: it lays out the key ideas that lead to the thing. Part III which will be posted tomorrow, presents a more detailed technical explication of the system itself.
It seems to be an article of faith in the web hosting and web server development communities that one of the most expensive resources that gets used up on a web server is open TCP connections. Consequently, a modern web server goes to great lengths to try to close any open TCP connection it can as soon as possible. Symptoms of this syndrome include short timeouts on HTTP Keep-Alive sessions (typically on the order of 10 seconds) and connection pool size limits on reverse proxies, gateways, and the like (indeed, a number of strange limits of various kinds seem to appear nearly any time you see the word “pool” used in any server related jargon). These guys really, really, really want to close that connection.
In the world as I see it, the most expensive thing is not an open connection per se. The cost of an open but inactive TCP connection is trivial: state data structures measured in the tens or hundreds of bytes, and buffer space measured in perhaps tens of kilobytes. Keeping hundreds of thousands of simultaneous inactive connections open on a single server (i.e., vastly more connections than the server would be able to service if they were all active) is really not that big a deal.
The expense I care about is the client latency associated with opening a new TCP connection. Over IP networks, just about the most expensive operation there is is opening a new TCP connection. In my more cynical moments, I imagine web guys thinking that since it is expensive, it must be valuable, so if we strive to do it as frequently as possible, we must be giving the users a lot of value, hence HTTP. However, the notable thing about this cost is that it is borne by the user, who pays it by sitting there waiting, whereas the cost of ongoing open connections is paid by the server owner.
So why do we have this IHMO upside down set of valuation memes driving the infrastructure of the net?
The answer, in part, lies in the architecture of a lot of server software, most notably Apache. Apache is not only the leading web server, it is arguably the template for many of its competitors and many of its symbionts. It is the 800 pound gorilla of web infrastructure.
Programming distributed systems is hard. Programming systems that do a lot of different things simultaneously is hard. Programming long-lived processes is hard. So a trick (and I should acknowledge up front that it’s a good trick) that Apache and its brethren use is the one-process-per-connection architecture (or, in some products, one-thread-per-connection). The idea is that you have a control process and a pool of worker processes. The control process designates one of the worker processes to listen for a new connection, while the others wait. When a new connection comes in, the worker process accepts the connection and notifies the control process, who hands off responsibility for listening to one of the other waiting processes from the pool (actually, often this handshake is handled by the OS itself rather than the control process per se, but the principle remains the same). The worker then goes about actually reading the HTTP request from the connection, processing it, sending the reply back to the client, and so on. When it’s done, it closes the connection and tells the control process to put it back into the pool of available worker processes, whence it gets recycled.
This is actually quite an elegant scheme. It kills several birds with one stone: the worker process doesn’t have to worry about coordinating with anything other than its sole client and the control process. The worker process can operate synchronously, which makes it much easier to program and to reason about (and thus to debug). If something goes horribly wrong and a particular HTTP request leads to something toxic, the worker process can crash without taking the rest of the world with it; the control process can easily spawn a new worker to replace it. And it need not even crash — it can simply exit prophylactically after processing a certain number of HTTP requests, thus mitigating problems due to slow storage leaks and cumulative data inconsistencies of various kinds. All this works because HTTP is a stateless RPC protocol: each HTTP request is a universe unto itself.
Given this model, it’s easy to see where the connections-are-expensive meme comes from: a TCP connection may be cheap, but a process certainly isn’t. If every live connection needs its own process to go with it, then a bunch of connections will eat up the server pretty quickly.
And, in the case of HTTP, the doctrine of statelessness is the key to scaling a web server farm. In such a world, it is frequently the case that successive HTTP requests have a high probability of being delivered to different servers anyway, and so the reasoning goes that although some TCP connects might be technically redundant, this will not make very much difference in the overall user experience. And some of the most obvious inefficiencies associated with loading a web page this way are addressed by persistent HTTP: when the browser knows in advance that it’s going to be fetching a bunch of resources all at once from a single host (such as all the images on a page), it can run all these requests through a single TCP session. This is a classic example of where optimization of a very common special case really pays off.
The problem with all this is that the user’s mental model of their relationship with a web site is often not stateless at all, and many web sites do a great deal of work in their presentation to encourage users to maintain a stateful view of things. So called “Web 2.0″ applications only enhance this effect, first because they blur the distinction between a page load and an interaction with the web site, and second because their more responsive Ajax user interfaces make the interaction between the user and the site much more conversational, where each side has to actively participate to hold up their end of the dialog.
In order for a web server to act as a participant in a conversation, it needs to have some short-term memory to keep track of what it was just talking to the user about. So after having built up this enormous infrastructure predicated on a stateless world, we then have to go to great effort and inconvenience to put the state back in again.
Traditionally, web applications keep the state in one of four places: in a database on the backend, in browser cookies, in hidden form fields on the page, and in URLs. Each of these solutions have distinct limitations.
Cookies, hidden form fields, and URLs suffer from very limited storage capacity and from being in the hands of the user. Encryption can mitigate the latter problem but not eliminate it — you can ensure that the bits aren’t tampered with but you can’t ensure that they won’t be gratuitously lost. These three techniques all require a significant amount of defensive programming if they are to work safely and reliably in any but the most trivial applications.
Databases can avoid the security, capacity and reliability problems with the other three methods, but at the cost of reintroducing one of the key problems that motivated statelessness in the first place: the need for a single point of contact for the data. Since the universe is born anew with each HTTP request, the web server that receives the request must query the database each time to reconstruct its model of the session, only to discard it again a moment later when request processing is finished. In essence, the web server is using its connection to the database — often a network connection to another server external to itself — as its memory bus. The breathtaking overhead of this has lead to a vast repertoire of engineering tricks and a huge after-market for support products to optimize things, in the form of a bewildering profusion of caches, query accelerators, special low-latency networking technologies, database clusters, high-performance storage solutions, and a host of other specialty products that frequently are just bandaids for the fundamental inefficiencies of the architecture that is being patched. In particular, I’ve been struck by the cargo-cult-like regard that some developers seem to have for the products of companies like Oracle and Network Appliance, apparently believing these products to possess some magic scaling juju that somehow makes them immune to the fundamental underlying problems, rather than merely being intensely market-driven focal points for the relentless incremental optimization of special cases.
(Before people start jumping in here and angrily pointing out all the wonderful things that databases can do, please note that I’m not talking about the many ways that web sites use databases for the kinds of things databases are properly used for: query and long term storage of complexly structured large data sets. I’m talking about the use of a database to hold the session state of a relatively short-term user interaction.)
And all of these approaches still impose some strong limitations on the range of applications that are practical. In particular, applications that involve concurrent interaction among multiple users (a very simple example is multi-user chat) are quite awkward in a web framework, as are applications that involve autonomous processes running inside the backend (a very simple example of this might be an alarm clock). These things are by no means impossible, but they definitely require you to cut against the grain.
Since the range of things that the web does do well is still mind bogglingly huge, these limitations have not been widely seen as pain points. There are a few major applications that fundamentally just don’t work well in the web paradigm and have simply ignored it, most notably massively multiplayer online games like World of Warcraft, but these are exceptions for the most part. However, there is some selection bias at work here: because the web encourages one form of application and not another, the web is dominated by the form that it favors. This is not really a surprise. What does bother me is that the limitations of the web have been so internalized by the current generation of developers that I’m not sure they are even aware of them, thus applications that step outside the standard model are never even conceived of in the first place.
Back in The Olden Days (i.e., to me, seems like yesterday, and, to many of my coworkers, before the dawn of time), the canonical networked server application was a single-threaded Unix program driven by an event loop sitting on top of a call to select(), listening for new connections on a server socket and listening for data I/O traffic on all the other open sockets. And that’s pretty much how it’s still done, even in the Apache architecture I described earlier, except that the population of developers has grown astronomically in the mean time, and most of those newer developers are working inside web frameworks that hide this from you. It’s not that developers are less sophisticated today — though many of them are, and that’s a Good Thing because it means you can do more with less — but it means that the fraction of developers who understand what they’re building on top of has gone way down. I hesitate to put percentages on it, lacking actual quantitivate data, but my suspicion is that it’s gone from something like “most of them” to something like “very, very few of them”.
But it’s worth asking what would happen if you implemented the backend for a web application like an old-fashioned stateful server process, i.e., keep the client interacting over the same TCP connection for the duration of the session, and just go ahead and keep the short-term state of the session in memory. Well, from the application developer’s perspective, that would be just terribly, terribly convenient. And that’s the idea behind Elko, the server and application framework this series of posts is concerned with. (Which, as mentioned in Part I, I’m now unleashing on the world as open source software that you can get here).
Now the only problem with the aforementioned approach, really, is that it blows the whole standard web scaling story completely to hell — that and the fact that the browser and the rest of the web infrastructure will try to thwart you at every turn as they attempt to optimize that which you are not doing. But let’s say you could overcome those issues, let’s say you had tricks to overcome the browser’s quirks, and had an awesome scaling story that worked in this paradigm. Obviously I wouldn’t have been going on at length about this if I didn’t have a punchline in mind, right? That will be the substance of Part III tomorrow.