Introducing Temporal .NET – Deterministic Workflow Authoring

dotnet-banner

A Temporal workflow is code that is executed in a durable, reliable, and scalable way. Today Temporal allows you to write workflows in Go, Java, Python, TypeScript, and more. You can now add .NET to that list with the release of the .NET SDK. While this post will focus on C#, any .NET language will work.

Different language runtimes have different trade-offs for writing workflows. Go is very fast and resource efficient due to runtime-supported coroutines, but that comes at the expense of type safety (even generics as implemented in Go are limited for this use). Java is also very fast and type safe, but a bit less resource efficient due to the lack of runtime-supported coroutines (but virtual threads are coming). It might sound weird to say, but our dynamic languages of JS/TypeScript and Python are probably the most type-safe SDKs when used properly; however, as can be expected, they are not the most resource efficient. .NET provides the best of all worlds: high performance like Go/Java, good resource utilization like Go, and high quality type-safe APIs.

Webinar recording - Introducing Temporal .NET and how it was built.

This post will give a high-level overview of the .NET SDK and some interesting challenges encountered during its development. To get more info about the SDK, see:

GitHub Repository – The README here provides the most comprehensive docs for .NET at the moment
NuGet Package
Temporal Documentation – Contains general-purpose Temporal documentation with .NET-specific content coming soon
API Documentation
Samples

Contents:

Introduction to Temporal with C#
How It Works – Workflow Determinism
Future of the .NET SDK

NOTE: A previous version of this post used a “Ref” pattern to invoke workflows, activities, signals, and queries. This has been updated to use the latest lambda expressions supported in current .NET SDK versions.

Introduction to Temporal with C##

To give a quick walkthrough of Temporal .NET, we'll implement a simplified form of one-click buying in C# where a purchase is started and then, unless cancelled, will be performed in 10 seconds.

Implementing an Activity#

Activities are the only way to interact with external resources in Temporal, such as making an HTTP request or accessing the file system. In .NET, all activities are just delegates which are usually just methods with the [Activity] attribute. Here's an activity that performs a purchase:

namespace MyNamespace;

using System.Net;
using System.Net.Http;
using System.Net.Http.Json;
using Temporalio.Activities;
using Temporalio.Exceptions;

public record Purchase(string ItemID, string UserID);

public class PurchaseActivities
{
    private readonly HttpClient client = new();

    [Activity]
    public async Task DoPurchaseAsync(Purchase purchase)
    {
        using var resp = await client.PostAsJsonAsync(
          "https://api.example.com/purchase",
          purchase,
          ActivityExecutionContext.Current.CancellationToken);

        // Make sure we succeeded
        try
        {
            resp.EnsureSuccessStatusCode();
        }
        catch (HttpRequestException e) when (resp.StatusCode < HttpStatusCode.InternalServerError)
        {
            // We don't want to retry 4xx status codes, only 5xx status codes
            throw new ApplicationFailureException("API returned error", e, nonRetryable: true);
        }
    }
}

This activity makes an HTTP call and takes care not to retry some types of HTTP errors.

Implementing a Workflow#

Now that we have an activity, we can implement our workflow:

namespace MyNamespace;

using Temporalio.Workflows;

public enum PurchaseStatus
{
    Pending,
    Confirmed,
    Cancelled,
    Completed
}

[Workflow]
public class OneClickBuyWorkflow
{
    private PurchaseStatus currentStatus = PurchaseStatus.Pending;
    private Purchase? currentPurchase;

    [WorkflowRun]
    public async Task<PurchaseStatus> RunAsync(Purchase purchase)
    {
        currentPurchase = purchase;

        // Give user 10 seconds to cancel or update before we send it through
        try
        {
            await Workflow.DelayAsync(TimeSpan.FromSeconds(10));
        }
        catch (TaskCanceledException)
        {
            currentStatus = PurchaseStatus.Cancelled;
            return currentStatus;
        }

        // Update the status, perform the purchase, update the status again
        currentStatus = PurchaseStatus.Confirmed;
        await Workflow.ExecuteActivityAsync(
            (PurchaseActivities act) => act.DoPurchaseAsync(currentPurchase!),
            new() { ScheduleToCloseTimeout = TimeSpan.FromMinutes(2) });
        currentStatus = PurchaseStatus.Completed;
        return currentStatus;
    }

    [WorkflowSignal]
    public async Task UpdatePurchaseAsync(Purchase purchase) => currentPurchase = purchase;

    [WorkflowQuery]
    public PurchaseStatus CurrentStatus() => currentStatus;
}

Workflows must be deterministic, and we use a custom task scheduler (explained later in this post).

Notice the Workflow.DelayAsync call there? That is a durable Temporal timer. When a cancellation token is not provided to it, it defaults to Workflow.CancellationToken so that cancelling the workflow implicitly cancels the tasks being awaited. Workflows must use Temporal-defined timing and scheduling, so something like Task.DelayAsync cannot be used. See the Workflow Determinism section below for more details.

Running a Worker#

Workflows and activities are run in workers like so:

using MyNamespace;
using Temporalio.Client;
using Temporalio.Worker;

// Create a client to localhost on "default" namespace
var client = await TemporalClient.ConnectAsync(new("localhost:7233"));

// Cancellation token to shut down worker on ctrl+c
using var tokenSource = new CancellationTokenSource();
Console.CancelKeyPress += (_, eventArgs) =>
{
    tokenSource.Cancel();
    eventArgs.Cancel = true;
};

// Create an activity instance since we have instance activities. If we had
// all static activities, we could just reference those directly.
var activities = new PurchaseActivities();

// Create worker with the activity and workflow registered
using var worker = new TemporalWorker(
    client,
    new TemporalWorkerOptions(taskQueue: "my-task-queue").
        AddActivity(activities.DoPurchaseAsync).
        AddWorkflow<OneClickBuyWorkflow>());

// Run worker until cancelled
Console.WriteLine("Running worker");
try
{
    await worker.ExecuteAsync(tokenSource.Token);
}
catch (OperationCanceledException)
{
    Console.WriteLine("Worker cancelled");
}

When executed, the worker will listen for Temporal server requests to perform workflow and activity invocations.

Executing a Workflow#

using MyNamespace;
using Temporalio.Client;

// Create a client to localhost on "default" namespace
var client = await TemporalClient.ConnectAsync(new("localhost:7233"));

// Start a workflow
var args = new Purchase(ItemID: "item1", UserID: "user1");
var handle = await client.StartWorkflowAsync(
    (OneClickBuyWorkflow wf) => wf.RunAsync(args),
    new(id: "my-workflow-id", taskQueue: "my-task-queue"));

// We can update the purchase if we want
var signalArgs = new Purchase(ItemID: "item2", UserID: "user1");
await handle.SignalAsync(wf => wf.UpdatePurchaseAsync(signalArgs));

// We can cancel it if we want
await handle.CancelAsync();

// We can query its status, even if the workflow is complete
var status = await handle.QueryAsync(wf => wf.CurrentStatus());
Console.WriteLine("Purchase workflow status: {0}", status);

// We can also wait on the result (which for our example is the same as query)
status = await handle.GetResultAsync();
Console.WriteLine("Purchase workflow result: {0}", status);

This is a tiny taste of the many features offered by Temporal .NET. See the .NET SDK README for more details.

How It Works – Workflow Determinism#

In Temporal, workflows must be deterministic. This means in addition to disallowing all the obvious stuff like random and system time, Temporal must also have strict control over task scheduling and coroutines in order to ensure deterministic execution.

While Python and others allow full control over the event loop (see this blog post), .NET does not. We make a custom TaskScheduler to order all created tasks deterministically, but we cannot control timers and many Task management calls in .NET (e.g. simple overloads of Task.Run) use TaskScheduler.Default implicitly instead of the preferred TaskScheduler.Current. Even some analyzer rules discourage use of calls that implicitly use TaskScheduler.Current though that is exactly what needs to be used in workflows. Sometimes it's not even obvious that something internal will use the default scheduler unexpectedly.

In order to solve this and prevent other non-deterministic calls, we would run in a sandbox. But recent versions of .NET have done away with some of this tooling (specifically "Code Access Security" and "Partially Trusted Code" features). These same issues also appear in Temporal Go and Java SDKs where we ask users not to do any platform threading/async outside of our deterministic scheduler.

So, we ask users to make sure all task calls are done on the current task scheduler and not to use timers. See the .NET SDK README for more details on what we limit.

We found it so hard to know which calls use threading and system timers in .NET that we are trying to eagerly detect these situations at runtime and compile time. At runtime, by default, we enable a tracing EventListener that intercepts a select few info-level task events to check whether, if we are running in a workflow, all tasks are being performed on the proper scheduler. Technically this event listener listens for all of these specific task events regardless of whether a workflow is executing, but our check to disregard non-workflow events is very cheap (basically just a thread local check). But we do allow the listener to be disabled if needed. This listener will suspend the workflow (i.e. fail the "workflow task") when invalid task scheduling is encountered. The workflow will resume when code is deployed with a fix.

In the future, there are two things that can help here. First, we want to create analyzers to find these mistakes at compile time (see "Future" section below). Second for timers, the new TimeProvider API recently merged will allow modern .NET versions to let us control timer creation instead of falling back to system timers.

Future of the .NET SDK#

The .NET SDK is a full-featured SDK on par with the others. There are three things we may add in the future.

First, we want to add source generation. We have the shape of activities and workflows, and therefore we can generate idiomatic caller-side structures to make invocation safer/easier. Source generation will always be optional, but may become the preferred way to call activities and workflows.

Second, we want to create a set of analyzers. We know what you can and can't call in a workflow, so static analysis to catch these invalid calls should be fairly easy to develop. This would work like any other .NET analyzer and is something we want to develop soon. Then again, maybe our approaches/investments in AI to find Temporal workflow mistakes will be completed first 🙂.

Finally, the new TimeProvider API will allow us to intercept timers in a much more transparent way for users. Granted, it will only work on the newest .NET versions.

The .NET SDK will be supported like other SDKs. Therefore, Temporal features like workflow updates will be added to the .NET SDK as they are added to other SDKs.

Try it out today! We want all feedback, positive or negative! Join us in #dotnet-sdk on Slack or on the forums.