Home/Kubernetes/Solving the serverless concurrency problem with Google Cloud Run
When Google announced the general availability of their latest fully managed deployment offering, Google Cloud Run, we wondered if this “serverless-like” deployment option might help remove one of the biggest challenges of deploying large-scale applications to serverless.
Google Cloud Platform (GCP) already provides an excellent serverless offering: Google Cloud Functions. Cloud Functions allows for rapid prototyping and deployment of serverless resources into the cloud. However, when building an application at scale, Cloud Functions have some subtle constraints when compared to deploying containers to a kubernetes cluster.
In this article, we dive into why Google Cloud Run’s concurrency model provides some specific advantages over Cloud Functions for a typical REST-type API service. This concurrency benefit has less impact when deploying CPU-intensive services, such as machine learning, image and video transformations, or server-side rendering.
For context, I have been working with mission-critical serverless deployments and large-scale applications, and on projects that involved moving features from a managed kubernetes cluster to Google Cloud Functions. These experiences enabled a greater understanding of the benefits and the unexpected pain that can arise when using serverless Cloud Functions.
What is Cloud Run?
A typical serverless function, such as Google Cloud Functions, boasts the ability to manage all server aspects and lets developers focus on business logic. Behind the scenes, the public cloud provider is building containers with one or more runtimes (Node.js being our preferred runtime) and calling a function deployed by the developer. Once deployed, the provider manages scaling including the ability to scale to zero when there is no load on the service.
Cloud Run works similarly once deployed but gives the responsibility of building the container to the developer. Assuming the container can respond to HTTP requests, a container with any runtime or binaries can be built and then deployed with Cloud Run. Google describes Cloud Run as a “fully managed compute platform for deploying and scaling containerised applications quickly and securely.”
Cloud Functions is like a hotel room where you enter the room, take a shower and then sleep. In contrast, Cloud Run is like renting an unfurnished apartment: First you have to bring your bed, sheets, pillows and soap before you can take a shower and sleep.
You are paying for a service, but you decide the balance between customisation, effort and price to use that service.
Scaling and Concurrency
One of the most exciting features of Google Cloud Run is the concurrency setting that allows setting per container concurrency. As Google states in the Cloud Run documentation, this sets the maximum allowed concurrency before the service is scaled up. It is important to note that other factors, such as CPU usage, also affect when services scale.
Node.js, Concurrency and the Event Loop
IO operations are inherently slow for applications. IO operations consist of reading or writing files, making network requests or making requests to other services such as a database. If Node.js waited for every IO operation to complete before continuing to execute the application, the application would essentially be blocked every time an IO operation was requested.
Instead, when a Node.js application attempts to make an IO operation, Node.js offloads that operation to the operating system kernel and continues executing the application code. The operating system kernel then efficiently manages a thread pool for IO operations and notifies Node.js when the operation has completed.
The ability to continue executing without waiting for an IO application to complete is known as asynchronous IO. The following diagram is a simplified illustration of how the Node.js event loop, event queue and asynchronous IO work together:
The notification of a completed IO operation arrives in the event queue. When the currently executing code of the application is done executing (when the call stack is empty), Node.js checks the event queue for completed events and then notifies the application code.
It is important to mention that Node.js has worker threads (since Node.js v10). However, it is still a single-threaded runtime. Worker threads coordinate multiple instances of the runtime in the same process. Node.js’s magic comes from its event loop and asynchronous IO.
Cloud Function Concurrency Problem
One of the hidden costs of using serverless Cloud Functions is that the runtime limits the concurrent requests to each instance to one. Arguably, this can simplify some of the programming requirements because developers don’t have to worry about concurrency and usage of global variables is allowed.
However, this severely underuses the efficiency of Node.js event-driven IO, which allows a single Node.js instance to serve many requests concurrently. In other words, when using Cloud Functions, the server is functionally busy during the lifetime of a single request.
The result of this restricted concurrency in Cloud Functions is that the function may be scaled up if there are more requests than there are instances to handle those requests. For a service under heavy load, this can quickly result in a large amount of scaling. That scaling can have unexpected and possibly detrimental side effects.
Consider an example of a typical backend web service API end-point. For example, imagine an API that:
1. Authenticates the incoming request 2. Gets some data from the database 3. Prepares the data and calls a third-party API 4. Stores the API result in the database 5. Returns the result to the caller
Typical API Request Lifecycle
Node.js is very efficient for serving an API like this because of its event-driven IO. In this example, if we consider the processing time for the request, the majority of that time is spent waiting for IO. This waiting is in steps 2, 3 and 4: waiting for the database and waiting for the response from the third-party API. During that time, a Node.js server can be processing other requests concurrently.
If we remove that concurrency, such as with Google Cloud Functions, we no longer benefit from the highly concurrent advantage of using Node.js. In many cases, there is less concern about the CPU and memory use when working with serverless. Those concerns have been abstracted away by the public cloud provider. Instead, we are billed according to some combination of the number of requests, the execution time of a request and the CPU usage.
The problem has been that this single concurrency model can result in unforeseen consequences on shared resources, but Cloud Run resolves this. The database is a prime example. When using a traditional relational database such as PostgreSQL, the database has a maximum number of concurrent connections. Servers are typically optimised using a database connection pool. The connection pool limits the maximum number of concurrent connections from a single server instance while allowing concurrent requests to efficiently share the pooled connections.
However, if each instance can only serve a single request and maintains an open connection to the database, there is a one-to-one correlation between the number of requests and the number of connections to the database. The result being that during moments of peak load, the database may become saturated with connections and eventually reject new connections. Cloud Functions provides a hard maximum scaling limit designed to eliminate this exact problem.
Imagine a database instance that has a maximum number of connections of 100, using Cloud Functions:
Cloud Function Concurrency
Using Cloud Functions with concurrency of one, our service can be scaled to handle 100 concurrent requests before the database reaches its maximum connection limit.
This demonstrates an effective maximum request load that a serverless deployment using Google Cloud Functions can have. Compare that to a traditional Node.js server making use of the event-driven IO. The server can serve concurrent requests while waiting on IO such as the database or the third-party API (Cloud Function 1 and 2 in the image above). Also since it has a database connection pool, it can allow parallel requests to the database.
Cloud Run Concurrency
What happens if we apply the same example to Google Cloud Run configured with the default maximum request concurrency of 80? If each container is configured with a connection pool size of 10, then each instance can allow 10 parallel queries to the database. Since the instance may receive up to 80 concurrent requests, the connection pool will automatically block incoming requests while waiting for a database connection to be freed and returned to the pool. By serving up to 80 requests with 10 connections, there is a theoretical throughput increase by a factor of 10 before the database reaches its maximum connection limit!
Cloud Run Concurrency
That’s a factor of eight times more concurrent requests with the same number of database connections.
This example is clearly theoretical. It is important to note that 800 concurrent requests doesn’t mean 800 concurrent database queries. However, we leverage the event-driven IO architecture of Node.js with a database connection pool to potentially serve more concurrent requests with the same number of database connections. The time for each query to the database will have an impact on the overall time to serve a request. Especially when the multiple requests share connections in the connection pool. The advantage is letting the Node.js server efficiently serve concurrent requests with its event-driven IO.
The performance benefit described in this article applies to IO-intensive services, meaning services that access the database and call other services. In my experience, this is the typical web service use case. Notably, CPU-intensive services, such as machine learning, image and video manipulation, or server-side rendering, will not see the same performance enhancement from Cloud Run’s concurrency. To the contrary, these types of services may even see degraded performance if too many concurrent requests are allowed to be serviced by a single container. Fortunately, Cloud Run allows restricting the concurrency to one to essentially mimic the Cloud Functions single concurrency model.
We are really excited to start using Google Cloud Run for applications that require dynamic scaling to handle huge peak loads. The ability to manage the number of concurrent requests per instance allows us to leverage the already existing efficiencies of event-driven IO of Node.js.
Cody Zuschlag is a Senior Software Engineer at Nearform and a part-time instructor at the Université Savoie Mont Blanc in Annecy, France. He has extensive experience working with node.js, cloud-first applications, and managed databases. His passion is creating the best developer experience and sharing technical knowledge to enable developers to create the best solutions. You can connect with Cody on LinkedIn.