Improving our Java SDK with Codex by OpenAI

We were recently invited to test and provide feedback on Codex, a new cloud-based software engineering agent from OpenAI that can work on many tasks in parallel. Turns out, Codex gave us a surprisingly effective way to level up our Temporal SDKs.

Initially, we started out with the goal of simply providing some usability feedback and moving on, but Codex had other plans. It quickly found its place in our toolchain, and folks across Temporal started getting value in meaningful ways.

What Codex Helped Us Build#

Codex was able to find and fix bugs, implement features, and perform some surprising refactoring operations. Codex complemented existing tools by providing a new way of performing work in the background while local development continued in IDEs and editors.

For the Java SDK, Codex was able to help us tackle some longstanding issues that we hadn’t been able to find time for yet. Tasks like improving code coverage, adding documentation, or implementing features that the Java SDK was missing that our other SDKs already had.

For example, our other SDKs already had an API to count workflow executions using search attributes (something the Java SDK was missing). After giving Codex the Github issue and a brief summary of the requirements, it was able to implement the API, including adding an end-to-end test. Codex’s change still needed some minor work to rename some classes and clean up the test, but it was able to do a majority of the work unsupervised.

A common pattern among engineers was to go through the backlog before lunch or at the end of the day and identify a few open issues, then kickoff multiple Codex tasks. Then after lunch or at the start of the next day come back and iterate from there.

Check out a sampling of PRs made with Codex in our open source repos:

This exploration also provided us an opportunity to ready our projects for agentic contributions. An AGENTS.md is a plain text markdown file that can be added to the top level of a repo or any subdirectory within it. The goal of the file is to provide guidance to Codex that will help it better comprehend the overall codebase and how to best interact with it.

Here is a sample from our sdk-java repo:

#Contributor Quickstart Guide

##Repository Layout
- `temporal-sdk`: core SDK implementation.
- `temporal-testing`: utilities to help write workflow and activity tests.
- `temporal-test-server`: in-memory Temporal server for fast tests.
- `temporal-serviceclient`: gRPC client for communicating with the service.
- `temporal-shaded`: prepackaged version of the SDK with shaded dependencies.
- `temporal-spring-boot-autoconfigure`: Spring Boot auto configuration.
- `temporal-kotlin`: Kotlin DSL for the SDK.
- `temporal-opentracing`: OpenTracing interceptor integration.

##General Guidance
- Avoid changing public API signatures. Anything under an `internal` directory
  is not part of the public API and may change freely.
- The SDK code is written for Java 8.

##Building and Testing
1. Format the code before committing:
   ```bash
   ./gradlew --offline spotlessApply
   ```
2. Run the tests. A full build requires a local Temporal Server instance.
   ```bash
   ./gradlew test
   ```
   To run only the core SDK tests or a single test:
   ```bash
   ./gradlew :temporal-sdk:test --offline --tests "io.temporal.workflow.*"
   ./gradlew :temporal-sdk:test --offline --tests "<package.ClassName>"
   ```
3. Build the project:
   ```bash
   ./gradlew clean build
   ```
##Tests
- Tests use JUnit4 and are located under
  `temporal-sdk/src/test/java/io/temporal`.
- Workflow API tests should rely on `SDKTestWorkflowRule` to create a worker and
  register workflows, activities, and nexus services.

##Commit Messages and Pull Requests
- Follow the [Chris Beams](http://chris.beams.io/posts/git-commit/) style for
  commit messages.
- Every pull request should answer:
  - **What changed?**
  - **Why?**
  - **Breaking changes?**
  - **Server PR** (if the change requires a coordinated server update)
- Comments should be complete sentences and end with a period.

##Review Checklist
- `./gradlew spotlessCheck` must pass.
- All tests from `./gradlew test` must succeed.
- Add new tests for any new feature or bug fix.
- Update documentation for user facing changes.

For more details see `CONTRIBUTING.md` in the repository root.

AGENTS.md files can be written from scratch or Codex can help bootstrap the process by prompting it to create a draft based on other instructions in the repo. Since Codex can run multiple jobs in parallel, you could even ask it to generate a file multiple times and compare the results of multiple tasks.

We will continue to work across our open source projects in the coming months to make them better suited for agentic contributions.

We’re looking forward to seeings how Codex evolves in the future!

Powered by Temporal#

This new, awesome product just so happens to run on Temporal. While we were busy putting Codex to work on our SDKs, it was quietly relying on Temporal to power its own reliability and scalability.

We asked the OpenAI team what role Temporal played behind the scenes., here’s what they had to say:

“Temporal is a critical part of the infrastructure powering Codex, responsible for executing our core control flows. It allows us to easily reason about concurrency, correctness, and fault tolerance, enabling us to move quickly to implement and scale a complicated distributed system required to build a product like Codex.” — Will Wang, SWE, Codex by OpenAI.

Are you curious how Codex and other AI systems achieve reliable execution at scale? Temporal Cloud makes that kind of reliability easier to build than you might think.

Give it a try with your first $1,000 in usage on us.

Improving our Java SDK with Codex by OpenAI

What Codex Helped Us Build#

Powered by Temporal#

More Posts