Reward instances (+1 or -1 in value) are visualized in the bottom-left box labeled “Incoming Reward”. The input features for the TAMER-learned reward model are the action and the distance and angle to the Vicon-tracked marker. The predictive model of human reward learned is shown in the three squares at the bottom right, from a birds-eye perspective of the robotic agent and the marker. The robot Nexi is facing upward, and the marker is shown as a white triangle. (Joining the YouTube html5 trial at youtube.com/html5 allows you to watch these videos at double speed with a compatible browser.)
Many thanks to Stefan Grabowski and Paula Aguilera for their help in editing the videos above.
Below are less informationally rich videos of training sessions, covering each of the five behaviors. They are presented in chronological order.
Go To (success)
Magnetic Control (first failure)
Magnetic Control (second failure)
Magnetic Control (third failure)
Magnetic Control (fourth attempt, stopped early for debugging)
Keep Conversational Distance (success)
Look Away (success)
Magnetic Control (success)
Toy Tantrum (success)