Intelligent Robot Learning Laboratory (IRL Lab) Agent Learning from Discrete Human Feedback

By: Bei Peng and Matthew E. Taylor

In this project, we consider the problem of a human trainer teaching an agent via providing positive or negative feedback. Most existing work has treated human feedback as a numerical value that the agent seeks to maximize, and has assumed that all trainers will give feedback in the same way when teaching the same behavior. In contrast, we treat the feedback as a human-delivered discrete communication between trainers and learners and different training strategies will be chosen by them. We propose a probabilistic model to classify different training strategies. We also present the SABL and I-SABL algorithms, which consider multiple interpretations of trainer feedback in order to learn behaviors more efficiently. Our online user studies show that human trainers follow various training strategies when teaching virtual agents and explicitly considering trainer strategy can allow a learner to make inferences from cases where no feedback is given. [1, 2, 3]

[1] [pdf] [doi] Robert Loftin, Bei Peng, James MacGlashan, Michael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Journal of Autonomous Agents and Multi-Agent Systems, pages 1-30, 2015.
[Bibtex]
@article{2015AAMAS-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Michael L. Littman and Matthew E. Taylor and Jeff Huang and David L. Roberts},
title={{Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning}},
journal={{Journal of Autonomous Agents and Multi-Agent Systems}},
pages={1--30},
year={2015},
doi={10.1007/s10458-015-9283-7},
publisher={Springer},
url={http://link.springer.com/article/10.1007%2Fs10458-015-9283-7},
abstract={ For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy. },
}
[2] [pdf] Robert Loftin, Bei Peng, James MacGlashan, Michael Littman, Matthew E. Taylor, David Roberts, and Jeff Huang. Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), August 2014.
[Bibtex]
@inproceedings{2014ROMAN-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Michael Littman and Matthew E. Taylor and David Roberts and Jeff Huang},
title={{Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies}},
booktitle={{Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication ({RO-MAN})}},
month={August},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}
[3] [pdf] Robert Loftin, Bei Peng, James MacGlashan, Machiael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
[Bibtex]
@inproceedings{2014AAAI-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Machiael L. Littman and Matthew E. Taylor and Jeff Huang and David L. Roberts},
title={{A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback}},
booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={July},
year={2014},
note={28% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}