How Goldcast scaled event orchestration to millions using Temporal

This is a guest post from Kunal Verma, Staff Software Engineer, and Himalaya Gahlot, Software Engineer at Goldcast.io. In this post, they share how Goldcast uses Temporal Workflows to reliably send large-scale event emails and to orchestrate event duplication with rollback-safe and async workflow phases.

To many, marketing doesn’t always seem like it’d be the most stressful gig, but if you’ve ever run large-scale marketing operations, then you know just how difficult operational tasks get.

Behind the shiny veneer of the personalized campaigns lies a snarl of fragile scripts and tense 5 a.m. Slack pings asking why the keynote reminder email didn’t go out.

Luckily, Goldcast sits right in the middle of this reality, cleaning up the mess. Our platform was created to help marketing teams turn live and on-demand video into pipeline. The challenge is this means we also live with the operational consequences of delivering hundreds of thousands of messages, registrations, and integrations on time. Every time.

For years, we needed reliable systems to ensure that our platform delivered our features to users at scale. Then, we found Temporal. What started out as an experiment to see what the product could do for us, then quickly became the backbone of our email delivery and event duplication.

This blog is a tour of how we got here and a testament to why we’re never going back to our pre-Temporal ways.

What Goldcast does#

So, as we’ve mentioned, Goldcast is a B2B event marketing platform built for turning live and on-demand video into measurable pipeline.

Beyond this, though, marketers use us to host webinars, virtual conferences, and hybrid events. Our product helps them automatically capture engagement data, feed it into their CRM, and follow up with attendees in a way that will actually drive their revenue.

From the outside, it looks like a seamless experience, but behind the scenes, our engineering team is orchestrating so much more than video. We’re moving a massive volume of attendee and registrant data, firing off targeted emails, and syncing with multiple marketing and sales tools just to name a few. What makes things even trickier is the challenge of keeping everything reliable at scale.

Goldcast’s engineering infrastructure and challenges#

One thing about us is that our scale is no joke. On average, we’re looking at:

100,000 to 200,000 emails for a single large event.
Dozens of concurrent live events with their own registration flows.
Real-time processing of attendee engagement and post-event follow up.

Pretty high stakes stuff that our customers are truly counting on.

The thing is, our legacy system did work, but it wasn’t reliable. The retries were quite fragile and even the state was scattered across queues, databases, and the kind of unwritten knowledge lives in the heads of long-tenured developers.

How we’re using Temporal#

We’re putting Temporal to use in a couple key ways.

Email reliability at scale#

The problem Ensuring the reliability of the processing of email requests, which need to be processed for upwards of a 30,000 person recipient list. Because of this, the team needs to ensure that the new architecture is easy to scale out with mass emails that range from 100,000 to 200,000 at a time.

The solution We now use Temporal Workflows for the entire delivery process, start to finish. We break the emails into multiple batches and then run the batches in parallel.

The Workflow handles email processing with validation, proper error handling, and support for both batch emails and role-based emails. The process uses Temporal’s built-in features for workflow management and deduplication.

Screenshot 2026-01-09 at 12.17.43 PM

You can see this illustrated in the above image and the specifics of the Workflow code below.

def process_email_workflow(request):
    """Orchestrates email processing with smart batching and error handling"""
    # Validate and determine email type
    validation = validate_email_request(request)

    if validation.is_role_based:
        users = get_role_based_users(validation.role)
        results = process_email_batch(users, validation.template_data)
    else:
        # Process first batch to get pagination info
        first_batch = get_users_batch(offset=0)
        total_batches = first_batch.total_pages

        # Start processing first batch immediately
        first_result = process_email_batch(first_batch.users, validation.template_data)

        # Concurrently fetch remaining users
        remaining_batches = get_remaining_batches_parallel(total_batches)

        # Process all batches in parallel
        batch_results = process_batches_parallel(remaining_batches)

        # Aggregate results
        results = aggregate_results([first_result] + batch_results)

    return format_final_output(results)

So why does this work? There are a few design patterns at play here that make this system both scalable and reliable:

A smart batching strategy: The first batch runs sequentially to grab pagination metadata and start work immediately. The remaining batches are fetched and processed in parallel so the emails can go out simultaneously across the entire list.
Resilient error handling: Activities return structured error objects instead of throwing exceptions, which makes failures easier to manage. Partial successes are preserved (one failed batch doesn’t block the rest) and even empty user lists are handled cleanly.
Performance optimizations: Templates are uploaded once and reused across batches, parallel execution maximizes throughput, and shared resources are coordinated across activities to avoid duplication.
Workflow routing: Early validation decides whether the request is role-based or bulk, but both paths eventually converge on the same processing activity. Results are then aggregated into a consistent format that’s easy to monitor.

Since rolling this Workflow into production, the results have been immediate and measurable.

We’re now seeing a 99.9% success rate on large email sends with 20,000+ recipients, and processing times have dropped significantly. In fact, 50,000 emails now go out in about 10 minutes. Deduplication is handled out of the box by the workflow itself, which has eliminated an entire class of failure we used to guard against manually. And because every send runs through Temporal, we have a complete audit trail for each email request, making it far easier to troubleshoot issues or answer questions when something doesn’t go as planned.

Event duplication orchestration#

Temporal is also a key factor in the orchestration of our event duplication.

The problem In our platform, duplicating events involves orchestrating dozens of interconnected operations — from cloning core event data to configuring media pipelines, migrating assets, and managing user associations. Without proper state management, any intermittent failure could leave the system in an inconsistent state where some entities are duplicated while others aren’t, creating data integrity issues and requiring manual intervention.

The solution To solve this, we implemented a two-phase Temporal Workflow design:

Phase 1: Main event duplication Workflow (atomic and rollback-safe) This synchronous workflow handles the core transactional operations with built-in rollback mechanisms. If any step fails, the Workflow ensures proper cleanup before returning an error, preventing orphaned resources or partial duplication states.

Key design principles:

Pre-validation: System state checks before any modifications
Atomic cloning: Core entity duplication within a single transaction
Clean rollback: Automatic cleanup of partially created resources on failure
Guaranteed outputs: Either complete success with all IDs or complete failure with cleanup

Screenshot 2026-01-09 at 12.24.34 PM

Phase 2: Async duplication Workflow (background processing) So now we move on to the asynchronous duplication. Once the core event duplication finishes successfully, we hand off everything that doesn’t need to block the user to a second, asynchronous workflow. This is where we take care of the slower, messier work. These are the kinds of tasks that are important, but shouldn’t slow down the main experience.

That async Workflow handles things like notifications and verification steps, cross-system integrations, quality assurance checks, and any long-running operations that would otherwise drag out the duplication process. By isolating this work, we keep the critical path clean while still ensuring everything completes reliably in the background.

This separation of concerns has been a big win for us. The main Workflow stays fast, which means users get immediate confirmation that their event has been duplicated. If something fails in the background (say, an external service times out) it doesn’t compromise the core event data. Those failures can be retried independently, and because they run in their own Workflows, we get clear visibility into what succeeded, what failed, and why.

The diagram illustrates how the main duplication Workflow completes first, then triggers the async Workflow as a follow-on step; allowing background processing to continue safely without blocking the user experience.

Screenshot 2026-01-09 at 12.27.40 PM

Here’s a simplified view of how the two phases are wired together in practice:

# Simplified Workflow structure
@workflow.defn
class EventDuplicationWorkflow:
    @workflow.run
    async def run(self, params: DuplicationParams) -> WorkflowResult:
        # Phase 1: Core duplication with rollback protection
        result = await self.execute_main_duplication(params)

        if result.success:
            # Phase 2: Trigger async processing
            asyncio.create_task(self.trigger_async_workflow(result))

        return result

Since rolling this architecture out, the impact has been clear. Event duplication now succeeds 99.8% of the time, up from 87% previously. Manual cleanup requests from our support team have dropped to zero. The main duplication flow completes in an average of 2.3 seconds, and every attempt (successful or not) comes with a full audit trail engineers can inspect when something goes wrong.

Why Temporal?#

We chose Temporal over other tooling options because of its open-source nature and its compatibility with startups.

If you’re a startup like us, then you understand why we can’t afford to waste cycles firefighting with our systems. Temporal gave us a way to solve these problems once and move on.

While we encountered a bit of an upfront learning curve, the challenge was well worth it. Temporal was a meaningful change in the way our team thinks about workflows and activities. Once our team got it, we saw the payoff immediately.

As Kostub, our Head of Engineering, put it:

“Our team was living it a tedious loop of fix, patch, and repeat. Temporal helped us break free of that cycle. Every hour we used to spend on operational firefighting is an hour we now spend building.”

And Kapil, our Senior Director of Engineering, also noted:

“As leaders, we spend a lot of time protecting our teams from unnecessary operational stress. Temporal helped us do that. It reduced the cognitive load on our engineers and replaced a lot of fragile, bespoke logic with something observable. That’s been a big win for us.”

What’s next?#

We plan on continuing to extract all the value we can get from Temporal. We’re now looking to expand into more services like processing bulk event registrations without failure, learning how Temporal can optimize their repeating events and simplify their analytics and reporting.

Since we love the open-source community, we’re also planning to contribute. We’d like to share some of the patterns and utilities we’ve built for idempotency, batching, and failure handling. The biggest win for us throughout this process is that we’ve built a foundation that we can trust even on our most stressful days.

If you’re curious about‌ how we use live and on-demand video to help marketing teams drive real pipelines (powered by all the dev elbow-grease you just read about) make sure you check us out.