Research Associate Professor of Computer Science at UT Austin
Contact:
wbradleyknox@gmail.com
Google Scholar

My most recent research on RLHF and value-aligned reward specification can be found here.

Here is a 5-minute talk I gave on reframing the common RLHF fine-tuning algorithm at the New Orleans Alignment Workshop. A longer version that covers much more is embedded below.


My research career has encompassed the control of robots and other agents, machine learning (reinforcement learning in particular), human-computer interaction, and computational models of human behavior for cognitive science research. I am particularly drawn to specifying problems, both doing so myself for novel research topics and studying how to enable users to specify problems for learning agents such that agents’ objectives are aligned with the users’ interests.