Code Exchange - tokuSolutions

Japanese toy manual translator demonstrating Temporal workflow patterns. I collect Kamen Rider transformation devices—all documentation is in Japanese. This automates translation while preserving layout context that other translation tools not specialized in toy manual loses. Extracts text from PDFs using Google Document AI, translates to English, and generates an interactive web viewer.

shy and tim with kamen-rider-belts

What You'll Learn#

Temporal workflow concepts:

Parent/child workflow orchestration
Fan-out/fan-in parallel execution
Activity retry policies tuned by operation type
Workflow queries for real-time progress
Deterministic workflow design
Activity heartbeats for long operations

Practical application:

Batch OCR processing with Google Document AI
Translation API integration
LLM-powered cleanup with structured validation
Interactive web viewer with inline editing

How Temporal Orchestrates Translation#

Parent workflow (pdf_translation_workflow.py) orchestrates four child workflows:

1. OCRWorkflow - Extract text from PDF

Product search on Tokullectibles
Get PDF page count
Fan-out/fan-in: Parallel OCR across all pages (0-N simultaneously)

2. TranslationWorkflow - Translate extracted text

Batch translation via Google Translate API

3. SiteGenerationWorkflow - Generate web viewer

Convert pages to WebP images
Create translations.json and viewer HTML
Heartbeats every 5 pages (long-running activity)

4. CleanupWorkflow - Improve translation quality

Stage 1: ftfy - Fix Unicode/OCR corruption (deterministic)
Stage 2: Rule-based - Remove noise patterns (deterministic)
Stage 3: Gemini AI - Context-aware corrections + tagging (non-deterministic)

Why child workflows?

Separation in Temporal UI - each phase visible as distinct workflow execution
Independent lifecycle - each phase has own event history and retry logic
Query support - parent workflow exposes real-time progress
Observability - better visibility into which phase is executing or failed

Real-time Progress Tracking#

CLI polls workflow using Temporal Queries (every 500ms) to display live progress:

📄 [1/4] OCR - Extracting text...
  ✓ Complete: 15 blocks from 5 pages
🌐 [2/4] Translation - Translating text...
  ✓ Complete: 15 blocks
🌍 [3/4] Site Generation - Creating viewer...
  ✓ Complete: 5 pages
✨ [4/4] Cleanup - Improving quality...
  ✓ Fixed 3 encoding issues
  ✓ Removed 2 noise blocks
  ✓ Applied 5 AI corrections

Parent workflow updates WorkflowProgress state with phase tracking and sub-progress from each child workflow.

Temporal Best Practices#

1. Retry Policies - Three strategies tuned for operation types:

QUICK_RETRY: Fast operations (file I/O) - 3 attempts, 1-10s backoff
API_RETRY: External APIs (Document AI, Translation) - 5 attempts, 2-30s backoff
LLM_RETRY: AI model calls (Gemini) - 3 attempts, 5s-2min backoff

2. Activity Separation for Determinism

Why three cleanup activities instead of one? Temporal workflows must be deterministic:

ftfy_cleanup_activity - Deterministic Unicode fixes (always same output)
rule_based_cleanup_activity - Deterministic pattern removal (regex patterns)
gemini_cleanup_activity - Non-deterministic AI corrections (LLM responses vary)

Benefits: Gemini can fail without breaking workflow, different retry policies per operation type, each stage visible in Temporal UI.

3. Heartbeats - Long-running activities send heartbeats to prevent timeouts:

Site generation: Every 5 pages during image rendering
Gemini cleanup: Before/after LLM API calls (2min timeout)

4. Workflow Determinism - Non-deterministic operations moved outside workflows:

Path operations (stem, name extraction) moved to CLI layer
Manual name and output directory computed before workflow starts
Only deterministic data transformations in workflow code

5. Type-Safe AI with Pydantic - LLM responses validated before reaching workflow:

class GeminiCleanupResponse(BaseModel):
    remove: list[str]           # Block indices to remove
    corrections: dict[str, str] # Index → corrected text
    product_name: str           # Official product name

Gemini returns JSON → Pydantic validates structure → Invalid responses trigger Temporal activity retry → Type safety across entire pipeline.

Performance & Cost#

Typical 20-page manual:

Time: ~50 seconds with single worker (30s OCR parallel, 5s translation, 10s AI cleanup, 5s site generation)
Cost: ~$0.43 (Document AI $0.03 + Translation API $0.40 + Gemini free)

Cost estimates (December 2024 pricing):

Small (5 pages): ~$0.11
Medium (20 pages): ~$0.43
Large (50 pages): ~$1.08

Gemini 1.5 Flash is free tier (15 RPM, 1M TPM, 1500 RPD) - sufficient for hobby use.

tokuSolutions

OCR Translation with Temporal Workflows

What You'll Learn#

How Temporal Orchestrates Translation#

Real-time Progress Tracking#

Temporal Best Practices#

Performance & Cost#