Resource-Based Auto-Tuning for Temporal Workers

11/12/2024 Update: Resource-based worker auto-tuning is now in Public Preview in Go, Java, Python, .NET and TypeScript! Documentation is now available for improving worker performance via tuning and the worker slot metrics are available as well. An important addition in Public Preview is better container support, i.e. the resource-based tuner now reads cgroups' memory limits and uses them as the memory ceiling when inside a container.

We're excited to announce resource-based auto-tuning for Workers -- a much-anticipated feature that aims to simplify Worker management in Temporal in Pre-release. Think of it as self-driving, but for Workers!

Before we dive into the details, let’s take a quick look at some key concepts.

A Temporal Worker is the entity you control that runs your code and works with the server to make forward progress on your Workflows and Activities. A single process can have one or many Workers.
Tasks are maintained in Task Queues. Tasks contain the information needed for a Worker to make progress on Workflows and Activities. There are two types of tasks:

Workflow Tasks, which include the type of the Workflow and some or all of the Event history.

Workflow Tasks, which include the type of the Workflow and some or all of the Event history.
Activity Tasks, which specify the Activity to run as well as inputs and other data.
Task Slots represent how much capacity Workers have to actually perform work concurrently.

Workers are responsible for making progress on your Workflow and they do so by receiving Tasks from a Task Queue. When Workers start, they open long-polling connections to the Server to get new tasks from the specified Task Queue. Before a Worker starts processing a Task, a Slot is reserved, and then marked used once the worker obtains a Task. Increasing the number of Task Slots allows for more concurrent tasks to be executed by Workers.

Current state of the world#

Today, Workers have to be manually right-sized by monitoring a handful of metrics that indicate how long Tasks are staying unprocessed in Task Queues, and how much capacity Workers currently have.

Fundamentally there are two situations you might encounter.

All your Workers are at capacity, and the backlog of tasks is growing. In this situation, you might need to horizontally scale out Workers, i.e. add more Workers.
If the backlog of Tasks is growing, but Worker hosts still have capacity, you would increase the Workers' maxConcurrentWorkflowTaskExecutionSize or maxConcurrentActivityExecutionSize options, which define the number of total available slots of their respective types for a Worker. This is vertical scaling of Workers, i.e. increasing the amount of work a Worker can accept.

The future! Introducing resource-based auto-tuning for Workers#

To minimize toil and simplify the configuration and management of Workers, we're introducing resource-based auto-tuning for Workers, starting with the automatic adjustment of available Slots[1]. This allows Workers to scale up to the maximum bounds of CPU and Memory of the underlying compute node you are running Workers in.

Resource-based tuning is particularly suitable for two use cases:

Avoiding out-of-memory problems.
Handling large bursts of low-resource-usage activities. Ex: Activities that spend the vast majority of their time waiting on slow HTTP or other I/O bound requests.

You can simply set targets for CPU and memory usage per Worker using the options targetMemoryUsage and targetCpuUsage in the Java SDK, Go, TypeScript, .NET and Python. Depending on how "utilized" you want your Worker to be, you can choose a value between 0 and 1[2]. Based on the targets set, Slots will be allocated until resource usage reaches those values. Keep in mind that setting values of 1 is typically ill-advised, especially when it comes to memory usage.

If you set these target values, you no longer have to monitor low-level workflow task executions[3]. Instead, you can simply specify the maximum resource capacity that you would like your Workers to utilize, and just allow the SDK to auto-burst up to available capacity when needed.

To verify that auto-tuning is working as expected, monitor the overall CPU and Memory usage of the system and confirm that it matches the target values you have set when the Worker is fully loaded.

Optionally

You can more granularly control the allocation of Slots by setting values for minimumSlots, maximumSlots, and rampThrottleMs for Workflow Tasks and Activity Tasks respectively. These values are fully optional and the SDK will use sane defaults for each if unspecified. minimumSlots defines the minimum number of slots that will be issued without considering available resources, while maximumSlots defines the maximum number of slots permitted to be handed out.

rampThrottleMs is the minimum time the SDK will wait (after passing the minimum slots number) between handing out new slots in milliseconds. This is an advanced option that needs to exist since Workers cannot know a-priori resources a given task will consume. Because of this, it is necessary to wait a brief period between making slots available so that the Worker can get a read on how resource usage has changed since the last task began processing.

If you have activities that you know spend some period of time idling before they start consuming significant amounts of CPU or memory, you might want to increase this value. Keep in mind that doing so will limit how quickly the worker can burst up to the number of available slots.

Alternatively

If you prefer not to use the resource-based tuning defined above, you could simply set a static value numSlots for the maximum number of slots you want issued. Use this option when you have clear, fixed limits on how many concurrent instances (typically of Activities) of a task type should be run and the resources your Workers have available to them. You can know this ahead of time through experimentation or observation of a workload.

What's coming next?#

Support for auto-tuning of pollers in SDKs

Feedback#

We would love to hear feedback and understand how auto-tuning is working for you. Please find us in the Temporal Community Slack or email us at product@temporal.io.

Please note that this feature is in Pre-release (otherwise known as Experimental) and may be subject to change.
These limits apply to resource usage system-wide and take into consideration language specific concerns like JVM heap size, for example. This means that you probably don’t want to run other resource-intensive processes on the same host as your Worker(s).
Note that you can't set maxConcurrentWorkflowTaskExecutionSize or maxConcurrentActivityExecutionSize and resource-based targets for a Worker at the same time. The SDK will throw an error if you attempt to do so.

Resource-Based Engineering: Auto-Tuning for Temporal Workers

Current state of the world#

The future! Introducing resource-based auto-tuning for Workers#

What's coming next?#

Feedback#

More Posts