TAMER+RL

We have also considered how robots can learn from both human reward and a predefined evaluation function (i.e., a reward function from a Markov Decision Process), combining TAMER with more conventional learning. In this setting, the evaluation function is given authority to determine correct behavior—the robot’s performance is judged solely by the evaluation function’s output—and the trainer’s feedback provides guidance. This research produced considerable improvements over learning without human training, both in learning speed and final performance.

Relevant publications

W. Bradley Knox and Peter Stone. Reinforcement Learning from Simultaneous Human and MDP Reward. In Proceedings of the Eleventh International Conference on Autonomous Agents and Multiagent Systems. June 2012.
[pdf] (930 kB)
AAMAS 2012

W. Bradley Knox and Peter Stone. Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning. In Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems. May 2010.
Pragnesh Jay Modi Best Student Paper Award
[pdf] (433 kB)
Poster: [pdf] (2.9 MB)
AAMAS 2010
Supplemental video for AAMAS 2012 paper. (Skimming the paper will make it easier to make sense of.)