Code Exchange - End-To-End Deep Research Agent with Temporal

This workshop walks through building a production-ready AI research agent from scratch, with a strong focus on reliability, durability, and scalability using Temporal.io.

The project uses DataTalks.Club podcast data as a real-world dataset.

What you’ll build#

You’ll implement an end-to-end pipeline that:

Fetches YouTube transcripts
- Download transcripts programmatically from YouTube
- Handle IP address blocks, network errors, and proxy configuration
Indexes data in Elasticsearch
- Design custom analyzers for better search quality
- Store and query long-form transcript data efficiently
Creates a deep research AI agent
- Use Pydantic AI to build a multi-stage research agent
- Enable tool calling for search and summarization
- Generate structured reports based on real videos
Makes everything reliable with Temporal.io
- Convert ingestion pipelines into durable Temporal workflows
- Add automatic retries, backoff, and crash recovery
- Run long-lived AI agents that preserve state across failures
- Observe execution via the Temporal Web UI

Problems this project solves#

Unreliable transcript ingestion: YouTube downloads often fail due to IP blocking, rate limits, proxy issues, and network errors.
Lost progress on failures: Notebook- and script-based pipelines require manual restarts when something breaks.
Complex retry handling: Retrying failed YouTube and Elasticsearch operations is hard to implement correctly.
Lack of visibility: It’s difficult to see execution progress and failures without a workflow engine.
Fragile AI agent execution: LLM-based research can fail mid-run due to API or network errors, losing conversation state.
Context limits for long transcripts: Full podcast transcripts exceed model context windows and require summarization before reasoning.

Key technologies#

Temporal (Python SDK)
Elasticsearch
Pydantic AI
OpenAI models
YouTube Transcript API
Docker & uv for environment management

Resources#

📹 Workshop video: https://www.youtube.com/watch?v=N1gaI3Qz6vw
💻 Source code: https://github.com/alexeygrigorev/workshops/tree/main/temporal.io

By the end of the workshop, you’ll have a durable AI research agent that can survive crashes, retry automatically, and scale.

End-To-End Deep Research Agent with Temporal

Learn how to ingest YouTube transcripts reliably, index them in Elasticsearch, and run multi-stage AI research workflows that survive failures and retries.

What you’ll build#

Problems this project solves#

Key technologies#

Resources#