Intelligent Robot Learning Laboratory (IRL Lab) Training an Agent to Ground Commands with Reward and Punishment

By: Bei Peng and Matthew E. Taylor

As increasing need for humans to convey complex tasks to robot without any technical expertise, conveying tasks through natural language provides an intuitive interface. But it needs the agent to learn a grounding of natural language commands. In this work, we developed a simple simulated home environment in which the robot needs to complete some tasks via learning from human positive or negative feedback. [1, 2, 3, 4]

[1] [pdf] Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. An Empirical Study of Non-Expert Curriculum Design for Machine Learners. In Proceedings of the Interactive Machine Learning workshop (at IJCAI), New York City, NY, USA, July 2016.
[Bibtex]
@inproceedings{2016IML-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{An Empirical Study of Non-Expert Curriculum Design for Machine Learners}},
booktitle={{Proceedings of the Interactive Machine Learning workshop (at {IJCAI})}},
month={July},
year={2016},
address={New York City, NY, USA},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Existing machine-learning work has shown that algorithms can benefit from curriculum learning, a strategy where the target behavior of the learner is changed over time. However, most existing work focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, neglecting how non-expert humans may design curricula. In this work we introduce a curriculumdesign problem in the context of reinforcement learning and conduct a user study to explicitly explore how non-expert humans go about assembling curricula. We present results from 80 participants on Amazon Mechanical Turk that show 1) humans can successfully design curricula that gradually introduce more complex concepts to the agent within each curriculum, and even across different curricula, and 2) users choose to add task complexity in different ways and follow salient principles when selecting tasks into the curriculum. This work serves as an important first step towards better integration of non-expert humans into the reinforcement learning process and the development of new machine learning algorithms to accommodate human teaching strategies.}
}
[2] [pdf] Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate
[Bibtex]
@inproceedings{2016AAMAS-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans}},
booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2016},
note={24.9% acceptance rate},
video={https://www.youtube.com/watch?v=AJQSGD_XPrk},
bib2html_pubtype={Refereed Conference},
abstract={As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.}
}
[3] [pdf] Bei Peng, Robert Loftin, James MacGlashan, Michael L. Littman, Matthew E. Taylor, and David L. Roberts. Language and Policy Learning from Human-delivered Feedback. In Proceedings of the Machine Learning for Social Robotics workshop (at ICRA), May 2015.
[Bibtex]
@inproceedings{2015ICRA-Peng,
author={Bei Peng and Robert Loftin and James MacGlashan and Michael L. Littman and Matthew E. Taylor and David L. Roberts},
title={{Language and Policy Learning from Human-delivered Feedback}},
booktitle={{Proceedings of the Machine Learning for Social Robotics workshop (at {ICRA})}},
month={May},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Using rewards and punishments is a common and familiar paradigm for humans to train intelligent agents. Most existing learning algorithms in this paradigm follow a framework in which human feedback is treated as a numerical signal to be maximized by the agent. However, treating feedback as a numeric signal fails to capitalize on implied information the human trainer conveys with a lack of explicit feedback. For example, a trainer may withhold reward to signal to the agent a failure, or they may withhold punishment to signal that the agent is behaving correctly. We review our progress to date with Strategy-aware Bayesian Learning, which is able to learn from experience the ways
trainers use feedback, and can exploit that knowledge to accelerate learning. Our work covers contextual bandits, goal-directed sequential decision-making tasks, and natural language command learning. We present a user study design to identify how users’ feedback strategies are affected by properties of the environment and agent competency for natural language command learning in sequential decision making tasks, which will inform the development of more adaptive models of human feedback in the future.}
}
[4] [pdf] James Macglashan, Michael L. Littman, Robert Loftin, Bei Peng, David Roberts, and Matthew E. Taylor. Training an Agent to Ground Commands with Reward and Punishment. In Proceedings of the Machine Learning for Interactive Systems workshop (at AAAI), July 2014.
[Bibtex]
@inproceedings(2014MLIS-James,
title={{Training an Agent to Ground Commands with Reward and Punishment}},
author={James Macglashan and Michael L. Littman and Robert Loftin and Bei Peng and David Roberts and Matthew E. Taylor},
booktitle={{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI})}},
month={July},
year={2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning}
)