January 8, 2018
Pool Abuse
Early in my tenure at PayPal I had a conversation with somebody who was having some kind of problem with a cromfelter pool (he didn’t actually say “cromfelter”, that’s a placeholder word I just made up; he was talking about a specific thing, but which specific thing he was talking about is of no concern here). I had a moment of unease and realized that my career has conditioned me to see any use of the word “pool” as a red flag — or maybe a yellow flag: it’s not that it means something is necessarily wrong, but it signals something wrong often enough that it’s probably worth taking a closer look, because the odds of finding something a bit off are reasonably good.
First, I should explain what I mean by the word “pool” in the context of this discussion. I mean some collection of some resource that is needed frequently but usually temporarily for some purpose by various entities. Things that are pooled in this sense include memory, disk space, IP addresses, port numbers, threads, worker processes, data connections, bandwidth, and cache space, as well as non-computational resources like cars, bicycles, bricklayers, journalists, and substitute teachers.
The general pattern is that a collection of resources of some type are placed under the control of an entity we’ll call the “pool manager”. The pool manager holds onto the collection (the “pool”) and regulates access to it. When some other entity, which we’ll call the “customer”, needs one of these resources (or some quantity of resources — the concept is basically the same regardless of whether the resources in question are discrete or not), they request it from the pool manager. The pool manager removes an instance of the resource from the pool and hands it to the customer. The customer then makes use of the resource, treating it as their own for the duration. When the customer is finished with the resource, they hand it back to the pool manager, who puts it back into the pool. There are many variations of this pattern, depending on how (or even whether) the pool manager keeps track of which customers have which resources, whether the pool manager has the power to take the resources back unilaterally, what happens when the pool is exhausted (create more, buy more, just say no, etc.), what happens when the pool gets overly full (destroy some, sell some, just keep them, etc.), whether there is some notion of priority, or pricing, or grades of resource (QOS, etc.) and so on. The range of possible variations is basically infinite.
As a generic pattern, there is nothing intrinsically objectionable about this. There are many resources that necessarily have to be managed like this somehow if we are to have any hope of arbitrating among the various entities that want to use them. (There’s a large literature exploring the many varied and interesting approaches to how you do this arbitrating, but that’s outside the scope of this essay.) However, in my experience it’s fairly common for the pool pattern to be seized upon as the solution to some problem, when it is merely a way to obtain symptomatic relief without confronting the underlying reality that gives rise to the problem in the first place, thus introducing complication and other costs that might have been avoided if we had actually attacked the fundamental underlying issues directly. Of course, sometimes symptomatic relief is the best we can do. If the underlying problem, even if well understood, lies in some domain over which we have no control ourselves (for example, if it’s in a component provided by an outside vendor who we are powerless to influence), we may just be stuck. But part of my purpose here is to encourage you to question whether or not this is actually your situation.
An example of a reasonable use of the pool pattern is a storage allocator. Memory is a fundamental resource. It makes sense to delegate its management to an external entity that can encapsulate all the particulars and idiosyncrasies of dealing with the operating system and the underlying memory management hardware, for mastering the complexities of increasingly sophisticated allocation algorithms, for arbitrating among diverse clients, and for adjusting to the changing demands imposed by changing use patterns over time. Note that one of the key indicators of the pool pattern’s appropriateness is that the resource is truly fundamental, in the sense that it isn’t a higher level abstraction that we could assemble ourselves if we needed to. Memory certainly fits this criterion — remarkably few applications are in a position, when discovering they are low on memory, to place an order with NewEgg for more RAM, and have it shipped to them and installed in the computer they are running on.
An example of a more questionable use of the pool pattern is a thread pool, where we maintain a collection of quiescent computational threads from which members are pulled to do work as work arrives to be done (such as responding to incoming HTTP requests). A thread is not a fundamental resource at all, but something we can create on demand according to our own specifications. The usual motivation for thread pools is that the cost of launching a new thread is seen, rightly or wrongly, to be high compared to the cost of retasking an existing thread. However, though a thread pool may address this cost, there can be other issues that it then introduces.
The first, of course, is that your analysis of the cost problem could be wrong. It is frequently the case that we develop intuitions about such things that are rooted in technological particulars of time and place in which we learned them, but technology evolves independent of whether or not we bother to keep our intuitions in synch with it. For example, here are some beliefs that used to be true but are no longer, but which are still held by some of my contemporaries: integer arithmetic is always much faster than floating point, operations on individual bytes are faster than operations on larger word sizes, doing a table lookup is faster than doing a multiplication, and writable CDs are an efficient and cost effective backup medium. In the case of threads, it turns out that some operating systems can launch a new process faster than you could pull a thread out of some kind of pool data structure and transfer control to it (though this does not account for the fact that you might also have a bunch of time consuming, application specific initialization that you would then have to do, which could be a problem and tilt the balance in favor of pooling anyway).
Another potential problem is that because a thread is not a fundamental resource, you are introducing an opportunity for unnecessary resource contention at a different level of abstraction. Each thread consumes memory and processor time, contending with other threads in the system, including ones that are not part of your thread pool. For instance, you might have two applications running, each of which wants to maintain a thread pool. The thread pool idea encourages each application’s developer to imagine that he or she is in complete control of the available thread resources, which of course they are not. Idle threads in one application tie up resources that could be used for doing work in the other. All the strategies for coping with this yourself are unsatisfactory in some way. You could say “don’t run more than one thread-pooled application on a machine”, but this is just relocating the boundary between the application with the unused thread resources and the application that needs them, forcing it to correspond to the machine boundary. Or you could require the allotment of threads to the two applications to be coordinated, but this introduces a complicated coupling between them when there might be no other reason for the people responsible for each to ever have to interact at all. Or you could combine the two applications in some way so that they share a single thread pool, but this introduces a potential bonanza of new ways for the two applications to interfere with each other in normal operation. If threads really were a fundamental resource, you would delegate their management to some kind of external manager; nominally this would be the operating system, but in fact is a big part of the service provided by so called “application containers” such as Java application servers like JBoss or Tomcat. However, these containers often provide much poorer isolation across application boundaries than do, say, operating system processes in a reasonable OS, and so even in these environments we still find ourselves falling back on one of the unsatisfactory approaches just discussed above, such as dedicating a separate app server process or even machine to each application. Note, by the way, that the problem of different uses contending for blocks of a common resource type is a perennial issue with all kinds of caching strategies, even within the scope of a single application in a single address space!
A final, related problem with things like thread pools is configuration management — basically, how many threads do you tell the application to create for its thread pool? The answer, of course, is that you have no a priori or analytic way to know, so you end up with an additional test-and-tune cycle as part of your development and deployment process, when you would like to be able to just tell the OS, “here, you take care of this.” Either that or you end up trying to have your pool manager figure this out dynamically, which ultimately leads to recapitulating the history of operating system kernel development but without the benefit of all the lessons that OS developers have learned over the years.
Despite all of these issues, it may still prove the case that a thread pool is the best option available given the functional constraints that you must operate within. However, it’s at least worth asking the question of whether the cost benefit tradeoff is favorable rather than simply assuming that you have to do things in a particular way because you just know something that maybe you don’t.
More generally, the kinds of potential issues I just described in the context of thread pools may apply in any kind of circumstance where what you are pooling is not a fundamental resource. Such pooling is often motivated by unexamined theories of resource cost. For example, network connections are sometimes pooled because they are perceived as expensive to have open, whereas often the expense is not the connection per se but some other expensive component that happens to be correlated with the connection, such as a process or a thread or a database cursor. In such a case, rearchitecting to break this correlation may be a superior approach to introducing another kind of pool. Pools can be used to amortize the cost of something expensive to create over multiple uses, or to share a resource that is expensive to hold. In either case, it may be more effective to confront that cost directly. It is possible that your beliefs about the cost are simply wrong, or that by creating or holding the resource in some other way the actual cost can be reduced or eliminated.
For all these reasons, when I hear that you have a pool, my first instinct is to ask if you really need it and if it is a good idea or not. It is entirely possible that the answer to both questions will be “yes”, but you should at least consider the matter.
I fucking love this post.
I’m a game developer, working primarily with Unity game engine. It’s a good engine, but very beginner-friendly – which leads to a community dominated by beginners, with terrible Dunning Kruger effect and a LOT of bad advice.
One such piece of advice is using “object pools” to mitigate the costs of instantiating new objects. Which actually makes sense – IF YOU’RE DOING IT EVERY FRAME OR AT LEAST EVERY 10-20 FRAMES.
Also, I’m more of a senior developer, and thus often coming aboard existing projects that need saving. I see object pools EVERY TIME. And every time, I see errors connected with these object pools implemented incorrectly – data not returned to initial state, references to objects which are already returned to pools, etc.
And every time, when I remove those object pools entirely and return to naive instantiation of objects, I see no performance penalty whatsoever.
Posted by: Max Yankov | January 10, 2018, 5:52 am
Question: if kernel/os developers have already solved this problem (I’m assuming many times), would it be possible to somehow expose the kernel level pooling capability so it could be used directly in higher level facilities/languages? Or is this what already happens with kernel process scheduling/management?
Posted by: Todd Holley | June 24, 2018, 10:19 am
The mechanics of pooling per se are fairly simple, so it’s not a particular burden to implement one yourself if that’s what is required. The issue is that if the resource being managed is truly fundamental, it makes sense to for it to be handled by the OS anyway, and if it’s not then it’s probably not something that should be pooled.
Posted by: Chip | June 25, 2018, 11:49 am