September 12, 2024
Announcing Auto-Tuning for Workers in Pre-Release
Resource-based auto-tuning for Workers is a much-anticipated feature that aims to simplify Worker management in Temporal available now in Pre-release. Think of it as self-driving, but for Workers!
Before we dive into the details, let’s take a quick look at some key concepts.
A Temporal Worker is the entity you control that runs your code and works with the server to make forward progress on your Workflows and Activities. A single process can have one or many Workers.
Task Queues are owned by the Temporal Server and Workers poll this queue for more work. They serve to logically separate work. For example, segregating different Workflows or Activities with different functional or resource requirements.
Tasks are maintained in Task Queues. Tasks contain the information needed for a Worker to make progress on Workflows and Activities. There are two types of tasks:
Workflow Tasks, which include the type of the Workflow and some or all of the Event history.
Activity Tasks, which specify the Activity to run as well as inputs and other data.
Task Slots represent how much capacity Workers have to actually perform work concurrently
Workers are responsible for making progress on your Workflow and they do so by receiving Tasks from a Task Queue. When Workers start, they open long-polling connections to the Server to get new tasks from the specified Task Queue. Before a Worker starts processing a Task, a Slot is reserved, and then marked used once the worker obtains a Task. Increasing the number of Task Slots allows for more concurrent tasks to be executed by Workers.
Today Workers have to be manually right-sized by monitoring a handful of metrics that indicate how long Tasks are staying unprocessed in Task Queues, and how much capacity Workers currently have.
Fundamentally there are two situations you might encounter.
All your Workers are at capacity, and the backlog of tasks is growing. In this situation, you might need to horizontally scale out Workers, i.e. add more Workers.
If the backlog of Tasks is growing, but Worker hosts still have capacity, you would increase the Workers’ maxConcurrentWorkflowTaskExecutionSize or maxConcurrentActivityExecutionSize options, which define the number of total available slots of their respective types for a Worker. This is vertical scaling of Workers, i.e. increasing the amount of work a Worker can accept.
To minimize toil and simplify the configuration and management of Workers, we’re introducing resource-based auto-tuning for Workers, starting with the automatic adjustment of available Slots. This allows Workers to scale up to the maximum bounds of CPU and Memory of the underlying compute node you are running Workers in.
Resource-based tuning is particularly suitable for two use cases:
Avoiding out-of-memory problems.
Handling large bursts of low-resource-usage activities. Ex: Activities that spend the vast majority of their time waiting on slow HTTP or other I/O bound requests.
You can simply set targets for CPU and memory usage per Worker using the options targetMemoryUsage and targetCpuUsage in the Java, Go, TypeScript, .NET and Python. Depending on how “utilized” you want your Worker to be, you can choose a value between 0 and 1. Based on the targets set, Slots will be allocated until resource usage reaches those values. Keep in mind that setting values of 1 is typically ill-advised, especially when it comes to memory usage.
If you set these target values, you no longer have to monitor low-level workflow task executions. Instead, you can simply specify the maximum resource capacity that you would like your Workers to utilize, and just allow the SDK to auto-burst up to available capacity when needed.
To verify that auto-tuning is working as expected, monitor the overall CPU and Memory usage of the system and confirm that it matches the target values you have set when the Worker is fully loaded.
Optionally You can more granularly control the allocation of Slots by setting values for minimumSlots, maximumSlots, and rampThrottleMs for Workflow Tasks and Activity Tasks respectively. These values are fully optional and the SDK will use sane defaults for each if unspecified. minimumSlots defines the minimum number of slots that will be issued without considering available resources, while maximumSlots defines the maximum number of slots permitted to be handed out.
rampThrottleMs is the minimum time the SDK will wait (after passing the minimum slots number) between handing out new slots in milliseconds. This is an advanced option that needs to exist since Workers cannot know a-priori resources a given task will consume. Because of this, it is necessary to wait a brief period between making slots available so that the Worker can get a read on how resource usage has changed since the last task began processing.
If you have activities that you know spend some period of time idling before they start consuming significant amounts of CPU or memory, you might want to increase this value. Keep in mind that doing so will limit how quickly the worker can burst up to the number of available slots.
Alternatively
If you prefer not to use the resource-based tuning defined above, you could simply set a static value numSlots for the maximum number of slots you want issued. Use this option when you have clear, fixed limits on how many concurrent instances (typically of Activities) of a task type should be run and the resources your Workers have available to them. You can know this ahead of time through experimentation or observation of a workload.
Support for resource-based auto-tuning of slots in additional SDKs
Support for auto-tuning of pollers in SDKs
Horizontal scaling of workers via task queue backlog
Sample apps to reference for worker auto-tuning