We have also considered how robots can learn from both human reward and a predefined evaluation function (i.e., a reward function from a Markov Decision Process), combining TAMER with more conventional learning. In this setting, the evaluation function is given authority to determine correct behavior—the robot’s performance is judged solely by the evaluation function’s output—and the trainer’s feedback provides guidance. This research produced considerable improvements over learning without human training, both in learning speed and final performance.

Relevant publications

