Cloud Run; Learnings, hacks and tips

23 Sep 2023

Over the past few weeks, I have been working on enhancing the efficiency of our computing resources which are hosted on Cloud Run. The aim was to improve the cold start and reduce cloud spending.

In this blog post, I would like to share some of the learnings, hacks and tips that I have discovered during this exercise.

. . .

Learnings

How to measure the cold starts?

Use container startup latency to measure cold start

Time between when an instance is started and when it’s ready to receive requests

This latency is primarily influenced by what your code does at startup, so now go optimize

Why minimum instances doesn’t always solve the problem

Minimum instances can be used to remove the cold-start encountered when going from zero to one instance, but min-instances aren’t a solution for all cold-starts as traffic scales out to higher numbers of instances. As part of our continued efforts to give you more control over cold start latency, startup CPU boost can help speed up every cold start.

Why use startup CPU boost over min instances

Min instance is here mainly for the “0 > 1” case and completely eliminates cold starts

Startup CPU Boost is here for N > N+1 and speeds up cold starts but does not eliminate them.

Use both, notably because CPU Boost doesn’t impact your bill much.

How request timeout setting works

For Cloud Run services, your container must send a response within the time specified in the request timeout setting after it receives a request, including the container startup time. Otherwise the request is ended and a 504 error is returned.

Container lifecycle

Instances must listen for requests within 4 minutes after being started and all containers within the instance need to be healthy.

A request waiting for an instance to start will be kept in a queue for a maximum of 10 seconds

If one or more Cloud Run containers exceed the total container memory limit, the instance is terminated. All requests that are still processing on the instance end with an HTTP 500 error.

Cloud Run serving errors

Code	Reason
HTTP 401	Client is not authenticated properly
HTTP 403	Client is not authorized to invoke or call the service
HTTP 404	Not Found
HTTP 429	No available container instances
HTTP 500	Cloud Run couldn’t manage the rate of traffic
HTTP 500 / HTTP 503	Container instances are exceeding memory limits
HTTP 503	Malformed response or container instance connection issue
HTTP 503	Unable to process some requests due to high concurrency setting
HTTP 504	Gateway timeout error

How pricing works in Cloud Run

We are paying for CPU, memory and the traffic sent to the client from your application (egress traffic).

Tier 2 pricing (in USD)

Resource	Free Tier	Charged Rate
CPU	100 milliseconds	$0.00003360 per vCPU-second
Memory	128 MB	$0.00000350 per GB-second

. . .

Hacks (vs impact on cost) to keep the instances warm

Min instances (affects pricing)
CPU boost (apparently doesn’t affect pricing)
Periodically ping servers (infrastructure overhead)
- You can also work around “cold starts” by periodically making requests to your Cloud Run service which can help prevent the container instances from scaling to zero. For this, use Google Cloud Scheduler to make requests every few minutes.
- Not recommended because its not only one container used, you are not assured that request ‘n’ and ‘n + 1’ reach the same container instance. Even with pre-warming.
- Or may be not
SIGTERM / SIGKILL infinite loop

. . .

Tips for performance optimisation

Minimize the number and size of the dependencies that your app loads
Keep your app’s “time to listen for requests” startup time short
Prevent your application process from crashing
The size of your container image has almost no impact on cold starts

. . .

Sunny

Cloud Run; Learnings, hacks and tips

Learnings

Hacks (vs impact on cost) to keep the instances warm

Tips for performance optimisation

Further reading