End-to-end AI workflows with Temporal at Descript

Temporal is a game changer. It gave us the ability to test our workflows with unit tests.

logo descript light

Industry

High Tech

Use Case

Applied AI

Company Size

51-250

SDK

Go

Temporal

Cloud

End-to-end AI workflows with Temporal at Descript

Descript is a modern video editor that uses AI extensively to make videos work more like documents. Their editing experience enables users to modify spoken language in audio and video on the fly by simply modifying a transcript. Their proprietary transcription process can also synthetically generate a speaker’s voice to add content to any video and then render and export these videos quickly and reliably.

After you upload a video to the platform the first step is to create a transcript of the video. This process fires off an asynchronous, multistage and parallelized process that re-encode audio, performs chunk-splitting, makes external API calls, merges results that may potentially arrive out of order, and then verifies their alignment. This is a very exact but complicated process and there are numerous points at which it can get held up.

Their legacy deployment of this process involved a patchwork of handwritten scripts and queues built with Node.js, RabbitMQ, and PostgreSQL. Application state was distributed across various queues and databases, and difficult to build and test, which resulted in production issues that were nearly impossible to debug.

We had one incident every week just on the transcription workflow because it was too complicated to maintain. It was nearly impossible to test the logic for every new feature end to end, so we were afraid of doing any changes in that code path.

To code or not to code

Early on, this homegrown system was sufficient, but as they scaled, they quickly realized they needed a better way to build, test, debug, monitor and scale this process. The team evaluated a number of workflow orchestration tools, but they soon realized that they strongly preferred the “Workflow-as-Code” approach of Temporal.

Descript wanted to write, test, deploy and version control workflows with the exact same languages and tools used in the rest of their stack. This ability to model complex dependencies and error handling as idiomatic Go code was a better fit than simple JSON/YAML DSL’s, particularly for local debugging and production monitoring. They also had scale and latency concerns with other solutions.

It was a game changing revelation - Temporal gave us the ability to test our workflows with unit tests, basically just writing code instead of writing JSON or YAML, which are completely untestable.

Migration to Temporal

The migration process was relatively simple. Transcription is a self contained service that doesn’t rely on Descript’s core databases. So the process was to create a new workflow and begin progressively migrating workloads over using feature flags. The end results of the pipeline were checked via a single table, making it easy to port over by effectively switching from handwritten queues to Temporal task queues.

Even though the migration was prototyped with a self hosted cluster they shifted to Temporal Cloud to support their deployment as they didn’t want to manage Temporal and it allowed them to more efficiently scale as their user base grew.

We migrated progressively using feature flags, and verified performance as we went, but we had confidence in the Temporal team so we didn’t stress about load testing.”

Looking back

Temporal has helped Descript solve fundamental reliability issues with its core transcription loop. The frequency of production incidents has declined from once-a-week to virtually zero. Nicolas attributes this success to the ability to test everything with unit tests as well as normal end-to-end testing on a staging environment.

Testing is important, but Temporal opened up more opportunities since migration:

  • Orchestration of multiple Workflows: you can flexibly reuse a workflow once it is written, allowing the transcription workflow to be easily composed with other APIs. This helped Descript develop a major transcription correction feature much more quickly than with the previous architecture.
  • Specialized Compute for Machine Learning: Some parts of the transcription process use Python components that generate CUDA instructions - these must be run on workers with GPUs, and then reliably merged back into a general data processing workflow. Temporal’s ability to route activities to multiple Task Queues made this straightforward.
  • Session Routing for Stateful Execution: You can finetune the execution of sequential activities on the same worker, using Go SDK Sessions, which Descript will need (but doesn’t yet use).
  • An unexpected advantage that Descript is enjoying is better observability of workflows, which helps improve customer service even when Descript is not at fault.

When a customer reports an issue, it’s very easy for us to just put the Workflow ID into the Temporal Web UI to see what is going on… I can see a summary of workflow status and access the full event history for audit needs. ”

What’s next?

Now that Descript has had production experience running with Temporal Cloud, they are looking to use Temporal for all thie AI workflows.

Everyone in the company can see more and more use cases where Temporal would be useful.

Build invincible apps

Ready to learn why companies like Snap, Alaska Airlines, and Nvidia trust Temporal as their secure and scalable way to build and innovate?