Reformulation of Decision Transformer for Robonomic Benchmark

Reinforcement learning typically involves an agent interacting with an environment to achieve a maximum reward. Our project disregards the traditional approach of estimating policies and simplifies Reinforcement Learning to a sequence modeling problem that can effectively be solved by the Trans- former architecture. Our project extends the capabilities of the initial Decision Transformer (DT) [4] to learn from mixed- quality input data. Our modified Decision Transformer quantifies the benefit of return-conditioned imitation learning on mixed- quality data by leveraging the robomimic datasets. We show that our Decision Transformer significantly outperforms standard behavioral cloning on mixed-quality data for the Lift and Can tasks. Overall, our Decision Transformer and semi-sparse reward function provide a new way to tackle the challenges of imitation learning with mixed-quality data. Our project below is an earlier version of the paper that I plan on publishing in the upcoming future. 

Decision Transformer for Robot Imitation Learning.pdf