Cloud Run; Learnings, hacks and tips
Over the past few weeks, I have been working on enhancing the efficiency of our computing resources which are hosted on Cloud Run. The aim was to improve the cold start and reduce cloud spending.
In this blog post, I would like to share some of the learnings, hacks and tips that I have discovered during this exercise.
Use container startup latency to measure cold start
Time between when an instance is started and when it’s ready to receive requests
This latency is primarily influenced by what your code does at startup, so now go optimize
Minimum instances can be used to remove the cold-start encountered when going from zero to one instance, but min-instances aren’t a solution for all cold-starts as traffic scales out to higher numbers of instances. As part of our continued efforts to give you more control over cold start latency, startup CPU boost can help speed up every cold start.
Min instance is here mainly for the “0 > 1” case and completely eliminates cold starts
Startup CPU Boost is here for N > N+1 and speeds up cold starts but does not eliminate them.
Use both, notably because CPU Boost doesn’t impact your bill much.
For Cloud Run services, your container must send a response within the time specified in the request timeout setting after it receives a request, including the container startup time. Otherwise the request is ended and a 504 error is returned.
Instances must listen for requests within 4 minutes after being started and all containers within the instance need to be healthy.
A request waiting for an instance to start will be kept in a queue for a maximum of 10 seconds
If one or more Cloud Run containers exceed the total container memory limit, the instance is terminated. All requests that are still processing on the instance end with an HTTP 500 error.
|HTTP 401||Client is not authenticated properly|
|HTTP 403||Client is not authorized to invoke or call the service|
|HTTP 404||Not Found|
|HTTP 429||No available container instances|
|HTTP 500||Cloud Run couldn’t manage the rate of traffic|
|HTTP 500 / HTTP 503||Container instances are exceeding memory limits|
|HTTP 503||Malformed response or container instance connection issue|
|HTTP 503||Unable to process some requests due to high concurrency setting|
|HTTP 504||Gateway timeout error|
We are paying for CPU, memory and the traffic sent to the client from your application (egress traffic).
Tier 2 pricing (in USD)
|Resource||Free Tier||Charged Rate|
|CPU||100 milliseconds||$0.00003360 per vCPU-second|
|Memory||128 MB||$0.00000350 per GB-second|
Hacks (vs impact on cost) to keep the instances warm
- Min instances (affects pricing)
- CPU boost (apparently doesn’t affect pricing)
- Periodically ping servers (infrastructure overhead)
- You can also work around “cold starts” by periodically making requests to your Cloud Run service which can help prevent the container instances from scaling to zero. For this, use Google Cloud Scheduler to make requests every few minutes.
- Not recommended because its not only one container used, you are not assured that request ‘n’ and ‘n + 1’ reach the same container instance. Even with pre-warming.
- Or may be not
- SIGTERM / SIGKILL infinite loop
Tips for performance optimisation
- Minimize the number and size of the dependencies that your app loads
- Keep your app’s “time to listen for requests” startup time short
- Prevent your application process from crashing
- The size of your container image has almost no impact on cold starts