Friday, April 19, 2024

A few bumps on the road to HTTP/2

I had recently blogged about making concurrent REST calls at scale using HTTP. I had observed that, thanks to async I/O and/or virtual threads, it's feasible to make large numbers of requests in parallel. But while threads are cheap, there is typically another limiting factor: the HTTP connection pool. This pool serves as a semaphore: if the pool is empty you'll have to wait. I've seen default pool configurations that have even tighter limits on connections to the same host, which is problematic for internal service-to-service calls.

On the other hand, the HTTP/2 protocol offers an end run around the pool: requests are multiplexed onto a single connection, and you get responses back in any order. HTTP/2 seems to promise some serious performance benefits. We could send numerous requests without the overhead of setting up and tearing down a new connection every time, and the protocol is more compact and requests can processed in parallel. Unfortunately, I found some gotchas while experimenting with HTTP/2 requests which I'm going to note here for future reference.


The SETTINGS_MAX_CONCURRENT_STREAMS problem

This server-side setting controls the maximum number of concurrent requests per connection. That is, just because you got yourself a HTTP/2 connection to the server doesn't mean you are home free. You're still limited in the number of in-flight requests you can put on the connection. Tomcat's max concurrent streams limit defaults to 100, though you can customize it. The client side might also have an equivalent setting. The question is, what happens under load when you hit the limit? You'd think the logical thing to do once you have the maximum number of concurrent requests is to either block or create a new connection. Unfortunately, what often happens is that the client would just blow up in your face. 

For example, JDK 11's HttpClient has this code in Http2Connection:

if (clientInitiated && numReservedClientStreams >= maxConcurrentClientInitiatedStreams()) { throw new IOException("too many concurrent streams"); }

Not helpful. Or if the client code does not enforce this limit, the server might just as rudely kill the connection if the client is too demanding. I don't see a way to deal with this beyond either avoiding HTTP/2 or layering on my own semaphore (because the existing connection pool pattern is useless in this context). 

The load balancing problem

At scale, you rarely run a single server. You'd run a cluster of nodes, with a load balancer in front to direct each request. Whether a simplistic round-robin or something more sophisticated, the load-balancing pattern breaks down in the face of long-lived connections like HTTP/2. The problem is that once a connection is made to a single node, that's where all your HTTP/2 requests will go until the cows come home. You'd have only one node servicing all your requests.

Load balancing needs to happen at the request level. In practice, that seems to mean client-side selection of nodes. Whether you manually maintain that list of IPs, or drop a service mesh on top of things, it all seems rather heavy-weight. 


Related links

No comments:

Post a Comment