Learning Social Interaction from the Wizard, 2012-
To program social robotic behavior, we developed a new technique: Learning from the Wizard (LfW). Here, a robot employs a form of machine learning called learning from demonstration to derive a model of a hidden puppeteer’s control during previous sessions of social interaction. (Interaction with secretly human-controlled interfaces is referred to as Wizard-of-Oz interaction.) With this model, the robot can autonomously emulate its former puppeteer, removing the need for a human operator. We developed and evaluated this technique within the task domain of acting as a robot learning companion, where young children played on an educational app with the robot. The autonomous robot performed as well as a human-controlled robot on most metrics. Curiously, when the metrics differed considerably, they appeared to favor the autonomous robot, which was judged as a more desirable play companion, more like a peer, and less like a teacher.
2014 AAAI MLIS workshop
Learning from Human-Generated Reward, 2007-
Much of my research has focused on algorithms that facilitate teaching by signals of approval and disapproval from a live human trainer. Operationalizing these signals as numeric reward in a reinforcement learning framework, we ask: Given the reward that technically unskilled users actually provide, how should a robot learn from these signals to behave as desired by the trainer? Relevant publications: my dissertation and those in sub-projects below
Sub-projects on human-generated reward:
TAMER – The TAMER framework is our myopic, model-based approach to learning only from human reward (described further here; videos of training here). Relevant publications: ICDL 2008, K-CAP 2009, HRI Late-breaking Reports 2012, ICSR 2013
TAMER+RL – TAMER+RL incorporates TAMER into various stratgies for learning from both human-generated reward and hard-coded reward from a Markov Decision Process (described further here). Relevant publications: AAMAS 2010, AAMAS 2012
Directly influencing the trainer – Using TAMER agents, we studied how humans trainers respond to changes in their perception of the agent and to certain changes in the agent’s behavior. One contribution of this work is a demonstration that students can induce human teachers to give more frequent feedback by deliberately lowering the quality of their behavior. In other research, we examined how changes to the training interface affected trainers. When the agent displays summaries of its own past performance to the trainer, we found that trainers gave more feedback and trained agents to higher performance levels. However, when the agent instead displays a measurement about its confidence in its own behavior, the trainer gives more feedback but creates worse performance than without an informative interface. Relevant publications: IJSR 2012, AAMAS 2013
Aligning Learning Objectives and Task Performance in Reinforcement Learning from Human Reward – In this line of research, we examined the impact of various algorithmic assumptions (e.g., the temporal discount rate and whether the agent experiences separate episodes of learning) on the agent’s performance on the task (described further here). Relevant publications: Ro-Man 2012, IUI 2013
Inferring Trust for HRI, 2012-2013
Nonverbal behaviors such as gaze patterns, body language, and facial expressions have been explored as “honest” or “leaky” signals that are often unconsciously given and received. Using supervised learning, we created a computational model to assess interpersonal trust in social interactions, discriminating by the trust-related nonverbal cues expressed during this social interaction. Feature generation and extraction are informed by previous psychological studies on nonverbal behavior and trust. In predicting trust-based behavior in an economic exchange game, the trust classifier significantly outperforms human judgment and various baseline models. Through this work on learning evaluative models of human trust, we hope to enable a robot to judge how trusted it is by its human collaborators. With this feedback signal, robots can learn how to build trust, which has been shown to improve communication and may thus be critical for human-robot collaboration. Frontiers in Psychology, 2013.
Animal-Robot Interaction, 2013-
Like people, dogs and cats live among technologies that affects their lives. Yet little of this technology has been designed with these pets in mind. We are developing systems that interact intelligently with animals to entertain, exercise, and empower them. Currently, we are developing a laser-chasing game, in which dogs or cats are tracked by a ceiling-mounted webcam, and a computer-controlled laser is moved with knowledge of the pet’s position and movement. Machine learning will be applied to optimize the specific laser strategy. We envision enabling owners to initiate and view the interaction remotely through a web interface, providing stimulation and exercise to pets when the owners are at work or otherwise cannot be present.
Interactive Museum of Reinforcement Learning, 2013-
The Interactive Museum of RL is a series of Java applets that show various combinations of reinforcement learning algorithms and task environments. It is meant as a place for those who are new to reinforcement learning to obtain an intuitive understanding of the field. Agents and environments are connected and visualized by RL Applet (see below). More agents and environments are needed. Please contact me if you have some of either to add.
RL Applet, 2012
Building upon the Java libraries for RL-Glue, I created an applet that can display real-time reinforcement learning. This applet runs in the web browser of nearly any user. Its intended functions include improving public awareness of reinforcement learning, aiding the teaching/understanding of reinforcement learning algorithms, and inspiring students to study reinforcement learning. Further, RL Applet can easily be adapted to allow human interaction with learning agents. Any Java-based, RL-Glue-compatible agent can be used, and environments must be Java-based and compatible with RL-Library‘s RL Viz application. (More info and source download here.)
Belief-Directed Exploration in Humans, 2010-12
To study the trade-off in humans between exploiting current knowledge of best behavior or exploring to learn more about behavioral outcomes, we calculated a model of optimal behavior on an experimental task that is partially observable, formulating the task as a POMDP. Fitting human task behavior with this model and less optimal variants, we found that human behavior appears reflective—considering past experience and unobservable task dynamics—but suboptimally lacks a precise assessment of the informational value of exploration. Relevant publication: Frontiers in Psychology 2012
Improving Fraud Detection through Relational Modeling, 2010
Working with Appleʼs Data Mining Group in a 2010 internship, I improved their established fraud-detection model by constructing relational features that exploit the interdependence of online orders. On historical data, these improvements catch an additional $3.5 million of attempted fraud annually, almost half of what was previously missed.
Automatic Diagnosis of Electrical System Failures, 2008
At NASA Ames, I worked with Dr. Ole Mengshoel on automatic diagnosis of electrical system failures, modeling the system with large dynamic Bayesian Networks. Before I departed, we took steps toward diagnosing and reconfiguring the system to adapt to failures. Relevant publication: IJCAI 2009 Workshop on …
Home-assistant robots, 2006-07
On a robot built from a Segway RMP base, we designed simple behaviors that build towards the complex capabilities required by a home-assistant robot. Especially adept at recognizing people, our robot earned 2nd place among 11 excellent teams in the 2007 Robocup@Home competition. Full website. Relevant publications: Robocup Symposium 2008, JoPha 2008
Opponent Modeling in the Robocup Coach Competition, 2005-2006
In the Robocup Simulated Coach League, participants’ algorithms must infer what patterns are being displayed by a simulated soccer team. For example, a pattern could be that Player 1 dribbles to the goal area and then passes to Player 2, who then shoots on the goal. For my first research project, I developed an opponent modeling algorithm that competed in the 2005 Robocup Simulated Coach League and emerged as world champion. The following year we received 2nd place. Relevant publication: AAAI 2006