Intelligent Robot Learning Laboratory (IRL Lab) Agent Corrections to Pac-Man from the Crowd

By: Gabriel V. de la Cruz Jr.Bei Peng, and Matthew E. Taylor

Reinforcement learning suffers from poor initial performance. Our approach uses crowdsourcing to provide non-expert suggestions to speed up learning of an RL agent. Currently, we are using Mrs. Pac-Man as our application domain for its popularity as a game. From our studies, we have already concluded that crowdsourcing, although non-experts, are good in identifying mistakes. We are now working on how we can integrate the crowd’s advice to speed up the RL agent’s learning. In the future, we intend to implement this approach to a physical robot. [1, 2]

[1] [pdf] [doi] Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. Towards Integrating Real-Time Crowd Advice with Reinforcement Learning. In The 20th ACM Conference on Intelligent User Interfaces (IUI), March 2015. Poster: 41% acceptance rate for poster submissions
[Bibtex]
@inproceedings{2015IUI-Delacruz,
author={de la Cruz, Jr., Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
title={{Towards Integrating Real-Time Crowd Advice with Reinforcement Learning}},
booktitle={{The 20th {ACM} Conference on Intelligent User Interfaces ({IUI})}},
month={March},
year={2015},
doi={10.1145/2732158.2732180},
note={Poster: 41% acceptance rate for poster submissions},
wwwnote={<a href="http://iui.acm.org/2015/">ACM iUI-15</a>},
bib2html_rescat={Reinforcement Learning, Crowdsourcing},
bib2html_pubtype={Short Refereed Conference},
bib2html_funding={NSF},
abstract={Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Demonstrating that the crowd is capable of generating this input, and discussing the types of errors that occur, serves as a critical first step in designing systems that use this real-time feedback to improve systems' learning performance on-the-fly.},
}
[2] [pdf] Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents. In Proceedings of the Learning for General Competency in Video Games workshop (AAAI), January 2015.
[Bibtex]
@inproceedings(2015AAAI-Delacruz,
title={{Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents}},
author={de la Cruz, Jr., Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
booktitle={{Proceedings of the Learning for General Competency in Video Games workshop ({AAAI})}},
month={January},
year={2015},
wwwnote={<a href="http://www.arcadelearningenvironment.org/aaai15-workshop/">The Arcade Learning Environment</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Crowdsourcing},
bib2html_funding={NSF},
abstract={Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly.},
)