← Code Exchange

End-To-End Deep Research Agent with Temporal

Learn how to ingest YouTube transcripts reliably, index them in Elasticsearch, and run multi-stage AI research workflows that survive failures and retries.


This workshop walks through building a production-ready AI research agent from scratch, with a strong focus on reliability, durability, and scalability using Temporal.io.

The project uses DataTalks.Club podcast data as a real-world dataset.

What you’ll build#

You’ll implement an end-to-end pipeline that:

  1. Fetches YouTube transcripts

    • Download transcripts programmatically from YouTube
    • Handle IP address blocks, network errors, and proxy configuration
  2. Indexes data in Elasticsearch

    • Design custom analyzers for better search quality
    • Store and query long-form transcript data efficiently
  3. Creates a deep research AI agent

    • Use Pydantic AI to build a multi-stage research agent
    • Enable tool calling for search and summarization
    • Generate structured reports based on real videos
  4. Makes everything reliable with Temporal.io

    • Convert ingestion pipelines into durable Temporal workflows
    • Add automatic retries, backoff, and crash recovery
    • Run long-lived AI agents that preserve state across failures
    • Observe execution via the Temporal Web UI

Problems this project solves#

  • Unreliable transcript ingestion: YouTube downloads often fail due to IP blocking, rate limits, proxy issues, and network errors.
  • Lost progress on failures: Notebook- and script-based pipelines require manual restarts when something breaks.
  • Complex retry handling: Retrying failed YouTube and Elasticsearch operations is hard to implement correctly.
  • Lack of visibility: It’s difficult to see execution progress and failures without a workflow engine.
  • Fragile AI agent execution: LLM-based research can fail mid-run due to API or network errors, losing conversation state.
  • Context limits for long transcripts: Full podcast transcripts exceed model context windows and require summarization before reasoning.

Key technologies#

  • Temporal (Python SDK)
  • Elasticsearch
  • Pydantic AI
  • OpenAI models
  • YouTube Transcript API
  • Docker & uv for environment management

Resources#

By the end of the workshop, you’ll have a durable AI research agent that can survive crashes, retry automatically, and scale.


Language

Python

Temporal Verified

✅ Reviewed
💖 Community

About the Author

alexeygrigorev-photo-codeexchange

Alexey Grigorev

DataTalks.Club