← Code Exchange

tokuSolutions

OCR Translation with Temporal Workflows


Japanese toy manual translator demonstrating Temporal workflow patterns. I collect Kamen Rider transformation devicesβ€”all documentation is in Japanese. This automates translation while preserving layout context that other translation tools not specialized in toy manual loses. Extracts text from PDFs using Google Document AI, translates to English, and generates an interactive web viewer.

shy and tim with kamen-rider-belts

What You'll Learn#

Temporal workflow concepts:

  • Parent/child workflow orchestration
  • Fan-out/fan-in parallel execution
  • Activity retry policies tuned by operation type
  • Workflow queries for real-time progress
  • Deterministic workflow design
  • Activity heartbeats for long operations

Practical application:

  • Batch OCR processing with Google Document AI
  • Translation API integration
  • LLM-powered cleanup with structured validation
  • Interactive web viewer with inline editing

How Temporal Orchestrates Translation#

Parent workflow (pdf_translation_workflow.py) orchestrates four child workflows:

1. OCRWorkflow - Extract text from PDF

  • Product search on Tokullectibles
  • Get PDF page count
  • Fan-out/fan-in: Parallel OCR across all pages (0-N simultaneously)

2. TranslationWorkflow - Translate extracted text

  • Batch translation via Google Translate API

3. SiteGenerationWorkflow - Generate web viewer

  • Convert pages to WebP images
  • Create translations.json and viewer HTML
  • Heartbeats every 5 pages (long-running activity)

4. CleanupWorkflow - Improve translation quality

  • Stage 1: ftfy - Fix Unicode/OCR corruption (deterministic)
  • Stage 2: Rule-based - Remove noise patterns (deterministic)
  • Stage 3: Gemini AI - Context-aware corrections + tagging (non-deterministic)

Why child workflows?

  • Separation in Temporal UI - each phase visible as distinct workflow execution
  • Independent lifecycle - each phase has own event history and retry logic
  • Query support - parent workflow exposes real-time progress
  • Observability - better visibility into which phase is executing or failed

Real-time Progress Tracking#

CLI polls workflow using Temporal Queries (every 500ms) to display live progress:

πŸ“„ [1/4] OCR - Extracting text...
  βœ“ Complete: 15 blocks from 5 pages
🌐 [2/4] Translation - Translating text...
  βœ“ Complete: 15 blocks
🌍 [3/4] Site Generation - Creating viewer...
  βœ“ Complete: 5 pages
✨ [4/4] Cleanup - Improving quality...
  βœ“ Fixed 3 encoding issues
  βœ“ Removed 2 noise blocks
  βœ“ Applied 5 AI corrections

Parent workflow updates WorkflowProgress state with phase tracking and sub-progress from each child workflow.

Temporal Best Practices#

1. Retry Policies - Three strategies tuned for operation types:

  • QUICK_RETRY: Fast operations (file I/O) - 3 attempts, 1-10s backoff
  • API_RETRY: External APIs (Document AI, Translation) - 5 attempts, 2-30s backoff
  • LLM_RETRY: AI model calls (Gemini) - 3 attempts, 5s-2min backoff

2. Activity Separation for Determinism

Why three cleanup activities instead of one? Temporal workflows must be deterministic:

  • ftfy_cleanup_activity - Deterministic Unicode fixes (always same output)
  • rule_based_cleanup_activity - Deterministic pattern removal (regex patterns)
  • gemini_cleanup_activity - Non-deterministic AI corrections (LLM responses vary)

Benefits: Gemini can fail without breaking workflow, different retry policies per operation type, each stage visible in Temporal UI.

3. Heartbeats - Long-running activities send heartbeats to prevent timeouts:

  • Site generation: Every 5 pages during image rendering
  • Gemini cleanup: Before/after LLM API calls (2min timeout)

4. Workflow Determinism - Non-deterministic operations moved outside workflows:

  • Path operations (stem, name extraction) moved to CLI layer
  • Manual name and output directory computed before workflow starts
  • Only deterministic data transformations in workflow code

5. Type-Safe AI with Pydantic - LLM responses validated before reaching workflow:

class GeminiCleanupResponse(BaseModel):
    remove: list[str]           # Block indices to remove
    corrections: dict[str, str] # Index β†’ corrected text
    product_name: str           # Official product name

Gemini returns JSON β†’ Pydantic validates structure β†’ Invalid responses trigger Temporal activity retry β†’ Type safety across entire pipeline.

Performance & Cost#

Typical 20-page manual:

  • Time: ~50 seconds with single worker (30s OCR parallel, 5s translation, 10s AI cleanup, 5s site generation)
  • Cost: ~$0.43 (Document AI $0.03 + Translation API $0.40 + Gemini free)

Cost estimates (December 2024 pricing):

  • Small (5 pages): ~$0.11
  • Medium (20 pages): ~$0.43
  • Large (50 pages): ~$1.08

Gemini 1.5 Flash is free tier (15 RPM, 1M TPM, 1500 RPD) - sufficient for hobby use.


Language

Python

Temporal Verified

βœ… Reviewed
πŸ’– Community

About the Author

Shy Ruparel Headshot.

Shy Ruparel

Senior Developer Advocate