← Code Exchange
End-To-End Deep Research Agent with Temporal
Learn how to ingest YouTube transcripts reliably, index them in Elasticsearch, and run multi-stage AI research workflows that survive failures and retries.
This workshop walks through building a production-ready AI research agent from scratch, with a strong focus on reliability, durability, and scalability using Temporal.io.
The project uses DataTalks.Club podcast data as a real-world dataset.
What you’ll build#
You’ll implement an end-to-end pipeline that:
-
Fetches YouTube transcripts
- Download transcripts programmatically from YouTube
- Handle IP address blocks, network errors, and proxy configuration
-
Indexes data in Elasticsearch
- Design custom analyzers for better search quality
- Store and query long-form transcript data efficiently
-
Creates a deep research AI agent
- Use Pydantic AI to build a multi-stage research agent
- Enable tool calling for search and summarization
- Generate structured reports based on real videos
-
Makes everything reliable with Temporal.io
- Convert ingestion pipelines into durable Temporal workflows
- Add automatic retries, backoff, and crash recovery
- Run long-lived AI agents that preserve state across failures
- Observe execution via the Temporal Web UI
Problems this project solves#
- Unreliable transcript ingestion: YouTube downloads often fail due to IP blocking, rate limits, proxy issues, and network errors.
- Lost progress on failures: Notebook- and script-based pipelines require manual restarts when something breaks.
- Complex retry handling: Retrying failed YouTube and Elasticsearch operations is hard to implement correctly.
- Lack of visibility: It’s difficult to see execution progress and failures without a workflow engine.
- Fragile AI agent execution: LLM-based research can fail mid-run due to API or network errors, losing conversation state.
- Context limits for long transcripts: Full podcast transcripts exceed model context windows and require summarization before reasoning.
Key technologies#
- Temporal (Python SDK)
- Elasticsearch
- Pydantic AI
- OpenAI models
- YouTube Transcript API
- Docker & uv for environment management
Resources#
- 📹 Workshop video: https://www.youtube.com/watch?v=N1gaI3Qz6vw
- 💻 Source code: https://github.com/alexeygrigorev/workshops/tree/main/temporal.io
By the end of the workshop, you’ll have a durable AI research agent that can survive crashes, retry automatically, and scale.
Language
Python
Temporal Verified
✅ Reviewed
💖 Community
About the Author

Alexey Grigorev
DataTalks.Club