Local Reinforcement Learning Example
This package allows the developer to simulate a job-scheduling system that uses a model to schedule jobs. There is a training loop workflow for training the model, an inference workflow for running inference, and a test end-to-end workflow. This incorporates RLlib, OpenAI Gym, and Temporal together. Gym and RLlib provide an out of the box framework for environment setup training and inference. Temporal provides auto-recovery from training failures, support for a long running training process, and job traceability. Training metrics are published to tensorboard (a native RLlib feature).
I plan to continue adding support for other RL methods (GRPO next) and tweak the logic to improve the results of training.