Intelligent Robot Learning Laboratory (IRL Lab) List of All Publications

 Authors: ALLCain, Chrisde la Cruz, Jr., Gabriel V.Du, YunshuHu, YangIrwin, JamesPeng, BeiTaylor, Matthew E.Wang, ZhaodongZhan, YusenZulas, A. Leah Type: ALLBook chapterPh.D. thesisBooksPaper in conference/workshop proceedingsJournal/magazine articleTechnical report

### 2017

• Salam El Bsat, Haitham Bou Ammar, and Matthew E. Taylor. Scalable Multitask Policy Gradient Reinforcement Learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017. 25% acceptance rate

Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.

@inproceedings{2017AAAI-ElBsat,
author={El Bsat, Salam and Bou Ammar, Haitham and Taylor, Matthew E.},
booktitle={{Proceedings of the 31st {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={February},
year={2017},
note={25% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
abstract={Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.}
}

• James MacGlashan, Mark K. Ho, Robert Loftin, Bei Peng, David Roberts, Matthew E. Taylor, and Michael L. Littman. Interactive Learning from Policy-Dependent Human Feedback. Technical Report, Jan 2017.

For agents and robots to become more useful, they must be able to quickly learn from non-technical users. This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner’s current policy. We present empirical results that show this assumption to be false—whether human trainers give a positive or negative feedback for a decision is influenced by the learner’s current policy. We argue that policy-dependent feedback, in addition to being commonplace, enables useful training strategies from which agents should benefit. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot, even with noisy image features.

@techreport{2017arXiv-MacGlashan,
author={MacGlashan, James and Ho, Mark K. and Loftin, Robert and Peng, Bei and Roberts, David and Taylor, Matthew E. and Littman, Michael L.},
title={{Interactive Learning from Policy-Dependent Human Feedback}},
journal={ArXiv e-prints},
archivePrefix="arXiv",
eprint={1701.06049},
primaryClass="cs.AI",
keywords={Computer Science - Artificial Intelligence, I.2.6},
year={2017},
month={Jan},
adsnote={Provided by the SAO/NASA Astrophysics Data System},
abstract={For agents and robots to become more useful, they must be able to quickly learn from non-technical users. This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assumption to be false---whether human trainers give a positive or negative feedback for a decision is influenced by the learner's current policy. We argue that policy-dependent feedback, in addition to being commonplace, enables useful training strategies from which agents should benefit. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot, even with noisy image features.}
}

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. Curriculum Design for Machine Learners in Sequential Decision Tasks (Extended Abstract). In Proceedings of the 2017 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2017. Extended abstract: 26% acceptance rate for papers, additional 22% for extended abstracts.

Existing machine-learning work has shown that algorithms can bene t from curricula|learning rst on simple examples before moving to more dicult examples. While most existing work on curriculum learning focuses on developing automatic methods to iteratively select training examples with increasing diculty tailored to the current ability of the learner, relatively little attention has been paid to the ways in which humans design curricula. We argue that a better understanding of the human-designed curricula could give us insights into the development of new machine learning algorithms and interfaces that can better accommodate machine- or human-created curricula. Our work addresses this emerging and vital area empirically, taking an important step to characterize the nature of human-designed curricula relative to the space of possible curricula and the performance bene ts that may (or may not) occur.

@inproceedings{2017AAMAS-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{Curriculum Design for Machine Learners in Sequential Decision Tasks (Extended Abstract)}},
booktitle={{Proceedings of the 2017 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2017},
note={Extended abstract: 26% acceptance rate for papers, additional 22% for extended abstracts.},
bib2html_pubtype={Refereed Conference},
abstract={Existing machine-learning work has shown that algorithms can benet from curricula|learning rst on simple examples before moving to more dicult examples. While most existing work on curriculum learning focuses on developing automatic methods to iteratively select training examples with increasing diculty tailored to the current ability of the learner, relatively little attention has been paid to the ways in which humans design curricula. We argue that a better understanding of the human-designed curricula could give us insights into the development of new machine learning algorithms and interfaces that can better accommodate machine- or human-created curricula. Our work addresses this emerging and vital area empirically, taking an important step to characterize the nature of human-designed curricula relative to the space of possible curricula and the performance benets that may (or may not) occur.}
}

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. Curriculum Design for Machine Learners in Sequential Decision Tasks. In Proceedings of the Adaptive Learning Agents workshop (at AAMAS), Sao Paulo, Brazil, May 2017.

Existing machine-learning work has shown that algorithms can benefit from curricula—learning first on simple examples before moving to more difficult examples. This work defines the curriculum-design problem in the context of sequential decision tasks, analyzes how different curricula affect agent learning in a Sokoban-like domain, and presents results of a user study that explores whether non-experts generate such curricula. Our results show that 1) different curricula can have substantial impact on training speeds while longer curricula do not always result in worse agent performance in learning all tasks within the curricula (including the target task), 2) more benefits of curricula can be found as the target task’s complexity increases, 3) the method for providing reward feedback to the agent as it learns within a curriculum does not change which curricula are best, 4) non-expert users can successfully design curricula that result in better overall agent performance than learning from scratch, even in the absence of feedback, and 5) non-expert users can discover and follow salient principles when selecting tasks in a curriculum. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula.

@inproceedings{ALA17-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{Curriculum Design for Machine Learners in Sequential Decision Tasks}},
booktitle={{Proceedings of the Adaptive Learning Agents workshop (at {AAMAS})}},
month={May},
year={2017},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Existing machine-learning work has shown that algorithms can benefit from curricula---learning first on simple examples before moving to more difficult examples. This work defines the curriculum-design problem in the context of sequential decision tasks, analyzes how different curricula affect agent learning in a Sokoban-like domain, and presents results of a user study that explores whether non-experts generate such curricula. Our results show that 1) different curricula can have substantial impact on training speeds while longer curricula do not always result in worse agent performance in learning all tasks within the curricula (including the target task), 2) more benefits of curricula can be found as the target task's complexity increases, 3) the method for providing reward feedback to the agent as it learns within a curriculum does not change which curricula are best, 4) non-expert users can successfully design curricula that result in better overall agent performance than learning from scratch, even in the absence of feedback, and 5) non-expert users can discover and follow salient principles when selecting tasks in a curriculum. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula. }
}

• Matthew E. Taylor and Sakire Arslan Ay. AI Projects for Computer Science Capstone Classes (Extended Abstract). In Proceedings of the Seventh Symposium on Educational Advances in Artificial Intelligence, February 2017.

Capstone senior design projects provide students with a collaborative software design and development experience to reinforce learned material while allowing students latitude in developing real-world applications. Our two-semester capstone classes are required for all computer science majors. Students must have completed a software engineering course — capstone classes are typically taken during their last two semesters. Project proposals come from a variety of sources, including industry, WSU faculty (from our own and other departments), local agencies, and entrepreneurs. We have recently targeted projects in AI — although students typically have little background, they find the ideas and methods compelling. This paper outlines our instructional approach and reports our experiences with three projects.

@inproceedings{2017EAAI-Taylor,
author={Taylor, Matthew E. and Arslan Ay, Sakire},
title={{AI Projects for Computer Science Capstone Classes (Extended Abstract)}},
booktitle={{Proceedings of the Seventh Symposium on Educational Advances in Artificial Intelligence}},
month={February},
year={2017},
wwwnote={<a href="http://www.cs.mtu.edu/~lebrown/eaai/">EAAI-17</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Pedagogy},
abstract={Capstone senior design projects provide students with a collaborative software design and development experience to reinforce learned material while allowing students latitude in developing real-world applications. Our two-semester capstone classes are required for all computer science majors. Students must have completed a software engineering course — capstone classes are typically taken during their last two semesters. Project proposals come from a variety of sources, including industry, WSU faculty (from our own and other departments), local agencies, and entrepreneurs. We have recently targeted projects in AI — although students typically have little background, they find the ideas and methods compelling. This paper outlines our instructional approach and reports our experiences with three projects.}
}

• Leah A. Zulas, Kaitlyn I. Franz, Darrin Griechen, and Matthew E. Taylor. Solar Decathlon Competition: Towards a Solar-Powered Smart Home. In Proceedings of the AI for Smart Grids and Buildings Workshop (at AAAI), February 2017.

Alternative energy is becoming a growing source of power in the United States, including wind, hydroelectric and solar. The Solar Decathlon is a competition run by the US Department of Energy every two years. Washington State University (WSU) is one of twenty teams recently selected to compete in the fall 2017 challenge. A central part to WSU’s entry is incorporating new and existing smart home technology from the grZound up. The smart home can help to optimize energy loads, battery life and general comfort of the user in the home. This paper discusses the high-level goals of the project, hardware selected, build strategy and anticipated approach.

@inproceedings{2017AAAI-Solar-Zulas,
author={Zulas, A. Leah and Franz, Kaitlyn I. and Griechen, Darrin and Taylor, Matthew E.},
title={{Solar Decathlon Competition: Towards a Solar-Powered Smart Home}},
booktitle={{Proceedings of the AI for Smart Grids and Buildings Workshop (at {AAAI})}},
month={February},
year={2017},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Alternative energy is becoming a growing source of power in the United States, including wind, hydroelectric and solar. The Solar Decathlon is a competition run by the US Department of Energy every two years. Washington State University (WSU) is one of twenty teams recently selected to compete in the fall 2017 challenge. A central part to WSU’s entry is incorporating new and existing smart home technology from the grZound up. The smart home can help to optimize energy loads, battery life and general comfort of the user in the home. This paper discusses the high-level goals of the project, hardware selected, build strategy and anticipated approach.}
}

### 2016

• Chris Cain, Anne Anderson, and Matthew E. Taylor. Content-Independent Classroom Gamification. In Proceedings of the ASEE’s 123rd Annual Conference & Exposition, New Orleans, LA, USA, June 2016.

This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.

@inproceedings{2016ASEE-Cain,
author={Chris Cain and Anne Anderson and Matthew E. Taylor},
title={{Content-Independent Classroom Gamification}},
booktitle={{Proceedings of the {ASEE}'s 123rd Annual Conference \& Exposition}},
month={June},
year={2016},
address={New Orleans, LA, USA},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Gamification, Motivation, Education},
abstract={This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.}
}

• Chris Cain, Anne Anderson, and Matthew E. Taylor. Content-Independent Classroom Gamification. Computers in Education Journal, 7(4):93-106, October–December 2016.

This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person pilot study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.

@article{2016CoED-Cain,
author={Cain, Chris and Anderson, Anne and Taylor, Matthew E.},
title={{Content-Independent Classroom Gamification}},
journal={{Computers in Education Journal}},
volume={7},
number={4},
pages={93--106},
month={October--December},
year={2016},
abstract={This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person pilot study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.}
}

• William Curran, Tim Brys, David Aha, Matthew E. Taylor, and William D. Smart. Dimensionality Reduced Reinforcement Learning for Assistive Robots. In AAAI 2016 Fall Symposium on Artificial Intelligence: Workshop on Artificial Intelligence for Human-Robot Interaction, Arlington, VA, USA, November 2016.

State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.

@inproceedings{2016AAAI-AI-HRI-Curran,
author={William Curran and Tim Brys and David Aha and Matthew E. Taylor and William D. Smart},
title={{Dimensionality Reduced Reinforcement Learning for Assistive Robots}},
booktitle={{{AAAI} 2016 Fall Symposium on Artificial Intelligence: Workshop on Artificial Intelligence for Human-Robot Interaction}},
month={November},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.}
}

• Yunshu Du, Gabriel V. de la Cruz Jr., James Irwin, and Matthew E. Taylor. Initial Progress in Transfer for Deep Reinforcement Learning Algorithms. In Proceedings of Deep Reinforcement Learning: Frontiers and Challenges workshop (at IJCAI), New York City, NY, USA, July 2016.

As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning (TL) techniques to speed up learning in DQN. We applied this technique in two domains, Atari games and cart-pole, and show that TL can improve DQN’s performance on both tasks without altering the network structure.

@inproceedings{2016DeepRL-Du,
author={Du, Yunshu and de la Cruz, Jr., Gabriel V. and Irwin, James and Taylor, Matthew E.},
title={{Initial Progress in Transfer for Deep Reinforcement Learning Algorithms}},
booktitle={{Proceedings of Deep Reinforcement Learning: Frontiers and Challenges workshop (at {IJCAI})}},
year={2016},
address={New York City, NY, USA},
month={July},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning (TL) techniques to speed up learning in DQN. We applied this technique in two domains, Atari games and cart-pole, and show that TL can improve DQN’s performance on both tasks without altering the network structure.
}
}

• Yunshu Du and Matthew E. Taylor. Work In-progress: Mining the Student Data for Fitness . In Proceedings of the 12th International Workshop on Agents and Data Mining Interaction (ADMI) (at AAMAS), Singapore, May 2016.

Data mining-driven agents are often used in applications such as waiting times estimation or traffic flow prediction. Such approaches often require large amounts of data from multiple sources, which may be difficult to obtain and lead to incomplete or noisy datasets. University ID card data, in contrast, is easy to access with very low noise. However, little attention has been paid to the availability of these datasets and few applications have been developed to improve student services on campus. This work uses data from CougCard, the Washington State University official ID card, used daily by most students. Our goal is to build an intelligent agent to improve student service quality by predicting the crowdedness at different campus facilities. This work in-progress focuses on the University Recreation Center, one of the most popular facilities on campus, to optimize students’ workout experiences.

@inproceedings{2016ADMI-Du,
author={Yunshu Du and Matthew E. Taylor},
title={{Work In-progress: Mining the Student Data for Fitness }},
booktitle={{Proceedings of the 12th International Workshop on Agents and Data Mining Interaction ({ADMI}) (at {AAMAS})}},
year={2016},
month={May},
abstract = {Data mining-driven agents are often used in applications such as waiting times estimation or traffic flow prediction. Such approaches often require large amounts of data from multiple sources, which may be difficult to obtain and lead to incomplete or noisy datasets. University ID card data, in contrast, is easy to access with very low noise. However, little attention has been paid to the availability of these datasets and few applications have been developed to improve student services on campus. This work uses data from CougCard, the Washington State University official ID card, used daily by most students. Our goal is to build an intelligent agent to improve student service quality by predicting the crowdedness at different campus facilities. This work in-progress focuses on the University Recreation Center, one of the most popular facilities on campus, to optimize students’ workout experiences.}
}

• Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. Efficiently detecting switches against non-stationary opponents. Autonomous Agents and Multi-Agent Systems, pages 1-23, November 2016.

Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner‚Äôs dilemma and then in a more realistic scenario, the Power TAC simulator.

@article{2016JAAMAS2-Hernandez-Leal,
author={Pablo Hernandez-Leal and Yusen Zhan and Matthew E. Taylor and L. Enrique {Sucar} and Enrique {Munoz de Cote}},
title={{Efficiently detecting switches against non-stationary opponents}},
journal={{Autonomous Agents and Multi-Agent Systems}},
pages={1--23},
month={November},
year={2016},
doi={10.1007/s10458-016-9352-6},
url={http://dx.doi.org/10.1007/s10458-016-9352-6},
issn={1387-2532},
abstract={Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner‚Äôs dilemma and then in a more realistic scenario, the Power TAC simulator.}
}

• Pablo Hernandez-Leal, Benajamin Rosman, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. A Bayesian Approach for Learning and Tracking Switching, Non-stationary Opponents (Extended Abstract). In Proceedings of 15th International Conference on Autonomous Agents and Multiagent Systems, Singapore, May 2016.

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. In this paper we extend Bayesian Policy Reuse to adversarial settings where opponents switch from one stationary strategy to another. Our extension enables online learning of new models when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach yields better performance than state-of-the-art approaches in terms of average rewards

@inproceedings{2016AAMAS-HernandezLeal,
author={Pablo Hernandez-Leal and Benajamin Rosman and Matthew E. Taylor and L. Enrique Sucar and Enrique Munoz de Cote},
title={{A Bayesian Approach for Learning and Tracking Switching, Non-stationary Opponents (Extended Abstract)}},
booktitle={{Proceedings of 15th International Conference on Autonomous Agents and Multiagent Systems}},
month={May},
year={2016},
abstract={In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. In this paper we extend Bayesian Policy Reuse to adversarial settings where opponents switch from one stationary strategy to another. Our extension enables online learning of new models when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach yields better performance than state-of-the-art approaches in terms of average rewards}
}

• Pablo Hernandez-Leal, Matthew E. Taylor, Benjamin Rosman, Enrique L. Sucar, and Enrique Munoz de Cote. Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach. In Proceedings of the Multiagent Interaction without Prior Coordination workshop (at AAAI), Phoenix, AZ, USA, February 2016.

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards.

@inproceedings{2016AAAI-HernandezLeal,
author={Pablo Hernandez-Leal and Matthew E. Taylor and Benjamin Rosman and L. Enrique Sucar and Enrique {Munoz de Cote}},
title={{Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach}},
booktitle={{Proceedings of the Multiagent Interaction without Prior Coordination workshop (at {AAAI})}},
year={2016},
month={February},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards.}
}

• Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems, pages 1-32, October 2016.

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max{\#} for learning and planning against non-stationary opponent. To handle such opponents, R-max{\#} reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max{\#} is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max{\#} makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

@article{2016JAAMAS-Hernandez-Leal,
author={Pablo Hernandez-Leal and Yusen Zhan and Matthew E. Taylor and L. Enrique {Sucar} and Enrique {Munoz de Cote}},
title={{An exploration strategy for non-stationary opponents}},
journal={{Autonomous Agents and Multi-Agent Systems}},
pages={1--32},
month={October},
year={2016},
pages={1--32},
issn={1573-7454},
doi={10.1007/s10458-016-9347-3},
url={http://dx.doi.org/10.1007/s10458-016-9347-3},
abstract={The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent's objective is to learn a model of the opponent's strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max{\#} for learning and planning against non-stationary opponent. To handle such opponents, R-max{\#} reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max{\#} is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity. R-max{\#} makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.}
}

• Yang Hu and Matthew E. Taylor. A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility. Transactions on Techniques for STEM Education, October–December 2016.

@article{2016STEMTransactions-Yang,
author={Hu, Yang and Taylor, Matthew E.},
title={{A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility}},
journal={{Transactions on Techniques for {STEM} Education}},
month={October--December},
year={2016},,
abstract={Taking a Computer-Aided Design (CAD) class is a prerequisite for Mechanical Engineering freshmen at many universities, including at Washington State University. The traditional way to learn CAD software is to follow examples and exercises in a textbook. However, using written instruction is not always effective because textbooks usually support a single strategy to construct a model. Missing even one detail may cause the student to become stuck, potentially leading to frustration.
To make the learning process easier and more interesting, we designed and implemented an intelligent tutorial system for an open source CAD program, FreeCAD, for the sake of teaching students some basic CAD skills (such as Boolean operations) to construct complex objects from multiple simple shapes. Instead of teaching a single method to construct a model, the program first automatically learns all possible ways to construct a model and then can teach the student to draw the 3D model in multiple ways. Previous research efforts have shown that learning multiple potential solutions can encourage students to develop the tools they need to solve new problems.
This study compares textbook learning with learning from two variants of our intelligent tutoring system. The textbook approach is considered the baseline. In the first tutorial variant, subjects were given minimal guidance and were asked to construct a model in multiple ways. Subjects in the second tutorial group were given two guided solutions to constructing a model and then asked to demonstrate the third solution when constructing the same model. Rather than directly providing instructions, participants in the first tutorial group were expected to independently explore and were only provided feedback when the program determined he/she had deviated too far from a potential solution. The three groups are compared by measuring the time needed to 1) successfully construct the same model in a testing phase, 2) use multiple methods to construct the same model in a testing phase, and 3) construct a novel model.}
}

• Yang Hu and Matthew E. Taylor. Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility. In Proceedings of the ASEE’s 123rd Annual Conference & Exposition, New Orleans, LA, USA, June 2016.

@inproceedings{2016ASEE-Hu,
author={Yang Hu and Matthew E. Taylor},
title={{Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility}},
booktitle={{Proceedings of the {ASEE}'s 123rd Annual Conference \& Exposition}},
month={June},
year={2016},
address={New Orleans, LA, USA},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Intelligent Tutoring System, Multiple solutions},
abstract={Taking a Computer-Aided Design (CAD) class is a prerequisite for Mechanical Engineering freshmen at many universities, including at Washington State University. The traditional way to learn CAD software is to follow examples and exercises in a textbook. However, using written instruction is not always effective because textbooks usually support single strategy to construct a model. Missing even one detail may cause the student to become stuck, potentially leading to frustration.
To make the learning process easier and more interesting, we designed and implemented an intelligent tutorial system for an open source CAD program, FreeCAD, for the sake of teaching students some basic CAD skills (such as Boolean operations) to construct complex objects from multiple simple shapes. Instead of teaching a single method to construct a model, the program first automatically learns all possible ways to construct a model and then can teach the student to draw the 3D model in multiple ways. Previous research efforts have shown that learning multiple potential solutions can encourage students to develop the tools they need to solve new problems.
This study compares textbook learning with learning from two variants of our intelligent tutoring system. The textbook approach is considered the baseline. In the first tutorial variant, subjects were given minimal guidance and were asked to construct a model in multiple ways. Subjects in the second tutorial group were given two guided solutions to constructing a model and then asked to demonstrate the third solution when constructing the same model. Rather than directly providing instructions, participants in the second tutorial group were expected to independently explore and were only provided feedback when the program determined he/she had deviated too far from a potential solution. The three groups are compared by measuring the time needed to 1) successfully construct the same model in a testing phase, 2) use multiple methods to construct the same model in a testing phase, and 3) construct a novel model.}
}

• David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. Work in Progress: Lifelong Learning for Disturbance Rejection on Mobile Robots. In Proceedings of the Adaptive Learning Agents (ALA) workshop (at AAMAS), Singapore, May 2016.

No two robots are exactly the same — even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Further, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled. These preliminary results are an initial step towards learning robust fault-tolerant control for arbitrary robots.

@inproceedings{2016ALA-Isele,
author={Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and de la Cruz, Jr., Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
title={{Work in Progress: Lifelong Learning for Disturbance Rejection on Mobile Robots}},
booktitle={{Proceedings of the Adaptive Learning Agents ({ALA}) workshop (at {AAMAS})}},
year={2016},
month={May},
abstract = {No two robots are exactly the same — even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Further, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled. These preliminary results are an initial step towards learning robust fault-tolerant control for arbitrary robots.}
}

• David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. Lifelong Learning for Disturbance Rejection on Mobile Robots. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016. 48% acceptance rate

No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.

@inproceedings{2016IROS-Isele,
author={Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and de la Cruz, Jr., Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
title={{Lifelong Learning for Disturbance Rejection on Mobile Robots}},
booktitle={{Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems ({IROS})}},
month={October},
year={2016},
note={48% acceptance rate},
video={https://youtu.be/u7pkhLx0FQ0},
bib2html_pubtype={Refereed Conference},
abstract={No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.}
}

• Timothy Lewis, Amy Hurst, Matthew E. Taylor, and Cynthia Matuszek. Using Language Groundings for Context-Sensitive Text Prediction. In Proceedings of EMNLP 2016 Workshop on Uphill Battles in Language Processing, Austin, TX, USA, November 2016.

In this paper, we present the concept of using language groundings for context-sensitive text prediction using a semantically informed, context-aware language model. We show initial findings from a preliminary study investigating how users react to a communication interface driven by context-based prediction using a simple language model. We suggest that the results support further exploration using a more informed semantic model and more realistic context.

@inproceedings{2016EMNLP-Lewis,
author={Timothy Lewis and Amy Hurst and Matthew E. Taylor and Cynthia Matuszek},
title={{Using Language Groundings for Context-Sensitive Text Prediction}},
booktitle={{Proceedings of {EMNLP} 2016 Workshop on Uphill Battles in Language Processing}},
month={November},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={In this paper, we present the concept of using language groundings for context-sensitive text prediction using a semantically informed, context-aware language model. We show initial findings from a preliminary study investigating how users react to a communication interface driven by context-based prediction using a simple language model. We suggest that the results support further exploration using a more informed semantic model and more realistic context.}
}

• Robert Loftin, Matthew E. Taylor, Michael L. Littman, James MacGlashan, Bei Peng, and David L. Roberts. Open Problems for Online Bayesian Inference in Neural Networks. In Proceedings of Bayesian Deep Learning workshop (at NIPS), December 2016.
@inproceedings{2016NIPS-BayesDL-Loftin,
author={Robert Loftin and Matthew E. Taylor and Michael L. Littman and James MacGlashan and Bei Peng and David L. Roberts},
title={{Open Problems for Online Bayesian Inference in Neural Networks}},
booktitle={{Proceedings of Bayesian Deep Learning workshop (at {NIPS})}},
month={December},
year={2016},
url={http://bayesiandeeplearning.org/papers/BDL_42.pdf},
bib2html_pubtype={Refereed Workshop or Symposium}
}

• Robert Loftin, James MacGlashan, Bei Peng, Matthew E. Taylor, Michael L. Littman, and David L. Roberts. Towards Behavior-Aware Model Learning from Human-Generated Trajectories. In AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction, Arlington, VA, USA, November 2016.

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAMis highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.

@inproceedings{2016AAAI-AI-HRI-Loftin,
author={Robert Loftin and James MacGlashan and Bei Peng and Matthew E. Taylor and Michael L. Littman and David L. Roberts},
title={{Towards Behavior-Aware Model Learning from Human-Generated Trajectories}},
booktitle={{{AAAI} Fall Symposium on Artificial Intelligence for Human-Robot Interaction}},
month={November},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM
makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAMis highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.}
}

• James MacGlashan, Michael L. Littman, David L. Roberts, Robert Loftin, Bei Peng, and Matthew E. Taylor. Convergent Actor Critic by Humans. In Workshop on Human-Robot Collaboration: Towards Co-Adaptive Learning Through Semi-Autonomy and Shared Control (at IROS), October 2016.

Programming robot behavior can be painstaking: for a layperson, this path is unavailable without investing significant effort in building up proficiency in coding. In contrast, nearly half of American households have a pet dog and at least some exposure to animal training, suggesting an alternative path for customizing robot behavior. Unfortunately, most existing reinforcement-learning (RL) algorithms are not well suited to learning from human-delivered reinforcement. This paper introduces a framework for incorporating human-delivered rewards into RL algorithms and preliminary results demonstrating feasibility.

@inproceedings{2016IROS-HRC-MacGlashan,
author={James MacGlashan and Michael L. Littman and David L. Roberts and Robert Loftin and Bei Peng and Matthew E. Taylor},
title={{Convergent Actor Critic by Humans}},
booktitle={{Workshop on Human-Robot Collaboration: Towards Co-Adaptive Learning Through Semi-Autonomy and Shared Control (at {IROS})}},
month={October},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Programming robot behavior can be painstaking: for a layperson, this path is unavailable without investing significant effort in building up proficiency in coding. In contrast, nearly half of American households have a pet dog and at least some exposure to animal training, suggesting an alternative path for customizing robot behavior. Unfortunately, most existing reinforcement-learning (RL) algorithms are not well suited to learning from human-delivered reinforcement. This paper introduces a framework for incorporating human-delivered rewards into RL algorithms and preliminary results demonstrating feasibility.}
}

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate

As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.

@inproceedings{2016AAMAS-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans}},
booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2016},
note={24.9% acceptance rate},
bib2html_pubtype={Refereed Conference},
abstract={As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.}
}

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. An Empirical Study of Non-Expert Curriculum Design for Machine Learners. In Proceedings of the Interactive Machine Learning workshop (at IJCAI), New York City, NY, USA, July 2016.

Existing machine-learning work has shown that algorithms can benefit from curriculum learning, a strategy where the target behavior of the learner is changed over time. However, most existing work focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, neglecting how non-expert humans may design curricula. In this work we introduce a curriculumdesign problem in the context of reinforcement learning and conduct a user study to explicitly explore how non-expert humans go about assembling curricula. We present results from 80 participants on Amazon Mechanical Turk that show 1) humans can successfully design curricula that gradually introduce more complex concepts to the agent within each curriculum, and even across different curricula, and 2) users choose to add task complexity in different ways and follow salient principles when selecting tasks into the curriculum. This work serves as an important first step towards better integration of non-expert humans into the reinforcement learning process and the development of new machine learning algorithms to accommodate human teaching strategies.

@inproceedings{2016IML-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{An Empirical Study of Non-Expert Curriculum Design for Machine Learners}},
booktitle={{Proceedings of the Interactive Machine Learning workshop (at {IJCAI})}},
month={July},
year={2016},
address={New York City, NY, USA},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Existing machine-learning work has shown that algorithms can benefit from curriculum learning, a strategy where the target behavior of the learner is changed over time. However, most existing work focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, neglecting how non-expert humans may design curricula. In this work we introduce a curriculumdesign problem in the context of reinforcement learning and conduct a user study to explicitly explore how non-expert humans go about assembling curricula. We present results from 80 participants on Amazon Mechanical Turk that show 1) humans can successfully design curricula that gradually introduce more complex concepts to the agent within each curriculum, and even across different curricula, and 2) users choose to add task complexity in different ways and follow salient principles when selecting tasks into the curriculum. This work serves as an important first step towards better integration of non-expert humans into the reinforcement learning process and the development of new machine learning algorithms to accommodate human teaching strategies.}
}

• Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova. Learning from Demonstration for Shaping through Inverse Reinforcement Learning. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate

Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.

@inproceedings{2016AAMAS-Suay,
author={Suay, Halit Bener and Brys, Tim and Taylor, Matthew E. and Chernova, Sonia},
title={{Learning from Demonstration for Shaping through Inverse Reinforcement Learning}},
booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2016},
note={24.9% acceptance rate},
bib2html_pubtype={Refereed Conference},
abstract={Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.}
}

• Zhaodong Wang and Matthew E. Taylor. Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study. In AAAI 2016 Spring Symposium, March 2016.

There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent’s demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other’s internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent’s confidence in its representation of the source agent’s policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.

@inproceedings{2016AAAI-SSS-Wang,
author={Zhaodong Wang and Matthew E. Taylor},
title={{Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study}},
booktitle={{{AAAI} 2016 Spring Symposium}},
month={March},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent's demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other's internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent's confidence in its representation of the source agent's policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.}
}

• Ruofei Xu, Robin Hartshorn, Ryan Huard, James Irwin, Kaitlyn Johnson, Gregory Nelson, Jon Campbell, Sakire Arslan Ay, and Matthew E. Taylor. Towards a Semi-Autonomous Wheelchair for Users with ALS. In Proceedings of Workshop on Autonomous Mobile Service Robots (at IJCAI), New York City, NY, USA, July 2016.

This paper discusses a prototype system built over two years by teams of undergraduate students with the goal of assisting users with Amyotrophic Lateral Sclerosis (ALS). The current prototype powered wheelchair uses both onboard and offboard sensors to navigate within and between rooms, avoiding obstacles. The wheelchair can be directly controlled via multiple input devices, including gaze tracking — in this case, the wheelchair can augment the user’s control to avoid obstacles. In its fully autonomous mode, the user can select a position on a pre-built map and the wheelchair will navigate to the desired location. This paper introduces the design and implementation of our system, as well as performs three sets of experiments to characterize its performance. The long-term goal of this work is to significantly improve the lives of users with mobility impairments, with a particular focus on those that have limited motor abilities.

@inproceedings{2016IJCAI-Xu,
author={Ruofei Xu and Robin Hartshorn and Ryan Huard and James Irwin and Kaitlyn Johnson and Gregory Nelson and Jon Campbell and Sakire Arslan Ay and Matthew E. Taylor},
title={{Towards a Semi-Autonomous Wheelchair for Users with {ALS}}},
booktitle={{Proceedings of Workshop on Autonomous Mobile Service Robots (at {IJCAI})}},
year={2016},
address={New York City, NY, USA},
month={July},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper discusses a prototype system built over two years by teams of undergraduate students with the goal of assisting users with Amyotrophic Lateral Sclerosis (ALS). The current prototype powered wheelchair uses both onboard and offboard sensors to navigate within and between rooms, avoiding obstacles. The wheelchair can be directly controlled via multiple input devices, including gaze tracking --- in this case, the wheelchair can augment the user's control to avoid obstacles. In its fully autonomous mode, the user can select a position on a pre-built map and the wheelchair will navigate to the desired location. This paper introduces the design and implementation of our system, as well as performs three sets of experiments to characterize its performance. The long-term goal of this work is to significantly improve the lives of users with mobility impairments, with a particular focus on those that have limited motor abilities.}
}

• Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer. In Proceedings of the 25th International Conference on Artificial Intelligence (IJCAI), July 2016. 25% acceptance rate

Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

@inproceedings{2016IJCAI-Zhan,
author={Yusen Zhan and Haitham Bou Ammar and Matthew E. Taylor},
title={{Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer}},
booktitle={{Proceedings of the 25th International Conference on Artificial Intelligence ({IJCAI})}},
month={July},
year={2016},
note={25% acceptance rate},
bib2html_pubtype={Refereed Conference},
abstract={Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.}
}

### 2015

• Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), January 2015. 27% acceptance rate

@inproceedings{2015AAAI-BouAamar,
author={Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew E. Taylor},
title={{Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment}},
booktitle={{Proceedings of the 29th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={January},
year={2015},
note={27% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
}

• Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. Reinforcement Learning from Demonstration through Shaping. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015. 28.8% acceptance rate

Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.

@inproceedings{2015IJCAI-Brys,
author={Tim Brys and Anna Harutyunyan and Halit Bener Suay and Sonia Chernova and Matthew E. Taylor and Ann Now\'e},
title={{Reinforcement Learning from Demonstration through Shaping}},
booktitle={{Proceedings of the International Joint Conference on Artificial Intelligence ({IJCAI})}},
year={2015},
note={28.8% acceptance rate},
bib2html_rescat={Reinforcement Learning},
bib2html_pubtype={Refereed Conference},
abstract={Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.}
}

• Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. Ensembles of Shapings. In The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2015. 15% acceptance rate for oral presentations

Many reinforcement learning algorithms try to solve a problem from scratch, i.e., without a priori knowledge. This works for small and simple problems, but quickly becomes impractical as problems of growing complexity are tackled. The reward function with which the agent evaluates its behaviour often is sparse and uninformative, which leads to the agent requiring large amounts of exploration before feedback is discovered and good behaviour can be generated. Reward shaping is one approach to address this problem, by enriching the reward signal with extra intermediate rewards, often of a heuristic nature. These intermediate rewards may be derived from expert knowledge, knowledge transferred from a previous task, demonstrations provided to the agent, etc. In many domains, multiple such pieces of knowledge are available, and could all potentially benefit the agent during its learning process. We investigate the use of ensemble techniques to automatically combine these various sources of information, helping the agent learn faster than with any of the individual pieces of information alone. We empirically show that the use of such ensembles alleviates two tuning problems: (1) the problem of selecting which (combination of) heuristic knowledge to use, and (2) the problem of tuning the scaling of this information as it is injected in the original reward function. We show that ensembles are both robust against bad information and bad scalings.

@inproceedings{2015RLDM-Brys,
author={Tim Brys and Anna Harutyunyan and Matthew E. Taylor and Ann Now\'e},
title={{Ensembles of Shapings}},
booktitle={{The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ({RLDM})}},
bib2html_rescat={Reinforcement Learning, Reward Shaping},
note={15% acceptance rate for oral presentations},
year={2015},
abstract={Many reinforcement learning algorithms try to solve a problem from scratch, i.e., without a priori knowledge. This works for small and simple problems, but quickly becomes impractical as problems of growing complexity are tackled. The reward function with which the agent evaluates its behaviour often is sparse and uninformative, which leads to the agent requiring large amounts of exploration before feedback is discovered and good behaviour can be generated. Reward shaping is one approach to address this problem, by enriching the reward signal with extra intermediate rewards, often of a heuristic nature. These intermediate rewards may be derived from expert knowledge, knowledge transferred from a previous task, demonstrations provided to the agent, etc. In many domains, multiple such pieces of knowledge are available, and could all potentially benefit the agent during its learning process. We investigate the use of ensemble techniques to automatically combine these various sources of information, helping the agent learn faster than with any of the individual pieces of information alone. We empirically show that the use of such ensembles alleviates two tuning problems: (1) the problem of selecting which (combination of) heuristic knowledge to use, and (2) the problem of tuning the scaling of this information as it is injected in the original reward function. We show that ensembles are both robust against bad information and bad scalings.}
}

• Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. Policy Transfer using Reward Shaping. In The 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2015. 25% acceptance rate

Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.

@inproceedings{2015AAMAS-Brys,
author={Tim Brys and Anna Harutyunyan and Matthew E. Taylor and Ann Now\'{e}},
title={{Policy Transfer using Reward Shaping}},
booktitle={{The 14th International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2015},
note={25% acceptance rate},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Refereed Conference},
abstract={Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.},
}

• Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. Towards Integrating Real-Time Crowd Advice with Reinforcement Learning. In The 20th ACM Conference on Intelligent User Interfaces (IUI), March 2015. Poster: 41% acceptance rate for poster submissions

Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Demonstrating that the crowd is capable of generating this input, and discussing the types of errors that occur, serves as a critical first step in designing systems that use this real-time feedback to improve systems’ learning performance on-the-fly.

@inproceedings{2015IUI-Delacruz,
author={de la Cruz, Jr., Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
title={{Towards Integrating Real-Time Crowd Advice with Reinforcement Learning}},
booktitle={{The 20th {ACM} Conference on Intelligent User Interfaces ({IUI})}},
month={March},
year={2015},
doi={10.1145/2732158.2732180},
note={Poster: 41% acceptance rate for poster submissions},
wwwnote={<a href="http://iui.acm.org/2015/">ACM iUI-15</a>},
bib2html_rescat={Reinforcement Learning, Crowdsourcing},
bib2html_pubtype={Short Refereed Conference},
bib2html_funding={NSF},
abstract={Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Demonstrating that the crowd is capable of generating this input, and discussing the types of errors that occur, serves as a critical first step in designing systems that use this real-time feedback to improve systems' learning performance on-the-fly.},
}

• Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents. In Proceedings of the Learning for General Competency in Video Games workshop (AAAI), January 2015.

Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly.

@inproceedings(2015AAAI-Delacruz,
title={{Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents}},
author={de la Cruz, Jr., Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
booktitle={{Proceedings of the Learning for General Competency in Video Games workshop ({AAAI})}},
month={January},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Crowdsourcing},
bib2html_funding={NSF},
abstract={Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly.},
)

• William Curran, Tim Brys, Matthew E. Taylor, and William D. Smart. Using PCA to Efficiently Represent State Spaces. In ICML-2015 European Workshop on Reinforcement Learning, Lille, France, July 2015.

Reinforcement learning algorithms need to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces. This is known as the curse of dimensionality. By projecting the agent’s state onto a low-dimensional manifold, we can represent the state space in a smaller and more efficient representation. By using this representation during learning, the agent can converge to a good policy much faster. We test this approach in the Mario Benchmarking Domain. When using dimensionality reduction in Mario, learning converges much faster to a good policy. But, there is a critical convergence-performance trade-off. By projecting onto a low-dimensional manifold, we are ignoring important data. In this paper, we explore this trade-off of convergence and performance. We find that learning in as few as 4 dimensions (instead of 9), we can improve performance past learning in the full dimensional space at a faster convergence rate.

@inproceedings{2015ICML-Curran,
author={William Curran and Tim Brys and Matthew E. Taylor and William D. Smart},
title={{Using PCA to Efficiently Represent State Spaces}},
booktitle={{{ICML}-2015 European Workshop on Reinforcement Learning}},
month={July},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Reinforcement learning algorithms need to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces. This is known as the curse of dimensionality. By projecting the agent’s state onto a low-dimensional manifold, we can represent the state space in a smaller and more efficient representation. By using this representation during learning, the agent can converge to a good policy much faster. We test this approach in the Mario Benchmarking Domain. When using dimensionality reduction in Mario, learning converges much faster to a good policy. But, there is a critical convergence-performance trade-off. By projecting onto a low-dimensional manifold, we are ignoring important data. In this paper, we explore this trade-off of convergence and performance. We find that learning in as few as 4 dimensions (instead of 9), we can improve performance past learning in the full dimensional space at a faster convergence rate.}
}

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. Transfer learning with probabilistic mapping selection. Adaptive Behavior, 23(1):3-19, 2015.

When transferring knowledge between reinforcement learning agents with different state representations or actions, past knowledge must be efficiently mapped to novel tasks so that it aids learning. The majority of the existing approaches use pre-defined mappings provided by a domain expert. To overcome this limitation and enable autonomous transfer learning, this paper introduces a method for weighting and using multiple inter-task mappings based on a probabilistic framework. Experimental results show that the use of multiple inter-task mappings, accompanied with a probabilistic selection mechanism, can significantly boost the performance of transfer learning relative to 1) learning without transfer and 2) using a single hand-picked mapping. We especially introduce novel tasks for transfer learning in a realistic simulation of the iCub robot, demonstrating the ability of the method to select mappings in complex tasks where human intuition could not be applied to select them. The results verified the efficacy of the proposed approach in a real world and complex environment.

@article{2015AdaptiveBehavior-Fachantidis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{Transfer learning with probabilistic mapping selection}},
volume={23},
number={1},
pages={3-19},
year={2015},
doi={10.1177/1059712314559525},
abstract={When transferring knowledge between reinforcement learning agents with different state representations or actions, past knowledge must be efficiently mapped to novel tasks so that it aids learning. The majority of the existing approaches use pre-defined mappings provided by a domain expert. To overcome this limitation and enable autonomous transfer learning, this paper introduces a method for weighting and using multiple inter-task mappings based on a probabilistic framework. Experimental results show that the use of multiple inter-task mappings, accompanied with a probabilistic selection mechanism, can significantly boost the performance of transfer learning relative to 1) learning without transfer and 2) using a single hand-picked mapping. We especially introduce novel tasks for transfer learning in a realistic simulation of the iCub robot, demonstrating the ability of the method to select mappings in complex tasks where human intuition could not be applied to select them. The results verified the efficacy of the proposed approach in a real world and complex environment.},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
}

• Pablo Hernandez-Leal, Matthew E. Taylor, Enrique Munoz de Cote, and Enrique L. Sucar. Learning Against Non-Stationary Opponents in Double Auctions. In Proceedings of the Adaptive Learning Agents (ALA) workshop 2015, Istanbul, Turkey, May 2015. Finalist for Best Student Paper

Energy markets are emerging around the world. In this context, the PowerTAC competition has gained attention for being a realistic and powerful simulation platform that can be used to perform robust research on retail energy markets. Agent in this complex environment typically use different strategies throughout their interaction, changing from one to another depending on diverse factors, for example, to adapt to population needs and to keep the competitors guessing. This poses a problem for learning algorithms as most of them are not capable of handling changing strategies. The previous champion of the PowerTAC competition is no exception, and is not capable of adapting quickly to non-stationary opponents, potentially impacting its performance. This paper introduces DriftER, an algorithm that learns a model of the opponent and keeps track of its error-rate. When the error-rate increases for several timesteps, the opponent has most likely changed strategy and the agent should learn a new model. Results in the PowerTAC simulator show that DriftER is capable of detecting switches in the opponent faster than an existing state of the art algorithms against switching (non-stationary) opponents obtaining better results in terms of profit and accuracy.

@inproceedings{2015ALA-HernandezLeal,
author={Pablo Hernandez-Leal and Matthew E. Taylor and Munoz de Cote, Enrique and Sucar, L. Enrique},
title={{Learning Against Non-Stationary Opponents in Double Auctions}},
booktitle={{Proceedings of the Adaptive Learning Agents ({ALA}) workshop 2015}},
year={2015},
month={May},
note = {Finalist for Best Student Paper},
abstract = {Energy markets are emerging around the world. In this context, the PowerTAC competition has gained attention for being a realistic and powerful simulation platform that can be used to perform robust research on retail energy markets. Agent in this complex environment typically use different strategies throughout their interaction, changing from one to another depending on diverse factors, for example, to adapt to population needs and to keep the competitors guessing. This poses a problem for learning algorithms as most of them are not capable of handling changing strategies. The previous champion of the PowerTAC competition is no exception, and is not capable of adapting quickly to non-stationary opponents, potentially impacting its performance. This paper introduces DriftER, an algorithm that learns a model of the opponent and keeps track of its error-rate. When the error-rate increases for several timesteps, the opponent has most likely changed strategy and the agent should learn a new model. Results in the PowerTAC simulator show that DriftER is capable of detecting switches in the opponent faster than an existing state of the art algorithms against switching (non-stationary) opponents obtaining better results in terms of profit and accuracy.}
}

• Pablo Hernandez-Leal, Matthew E. Taylor, Enrique Munoz de Cote, and Enrique L. Sucar. Bidding in Non-Stationary Energy Markets. In The 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2015. Extended Abstract: 25% acceptance rate for papers, additional 22% for extended abstracts

The PowerTAC competition has gained attention for being a realistic and powerful simulation platform used for research on retail engergy markets, in part because of the growing number of energy markets worldwide. Agetns in this complex environment typically use multiple strategies, changing from one to another, posing a problem for current learning algorithms. This paper introduces DriftER, an algorithm that learns an opponent model and tracks its error rate. We compare our algorithm in the PowerTAC simulator against the champion of the 2013 competition and a state of the art algorithm tailored for interacting against switching (non-stationary) opponents. The results show that DriftER outperforms the competition in terms of profit and accuracy.

@inproceedings{2015AAMAS-HernandezLeal,
author={Pablo Hernandez-Leal and Matthew E. Taylor and Enrique Munoz {de Cote} and L. Enrique Sucar},
title={{Bidding in Non-Stationary Energy Markets}},
booktitle={{The 14th International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2015},
note={Extended Abstract: 25% acceptance rate for papers, additional 22% for extended abstracts},
bib2html_rescat={Multiagent Systems},
bib2html_pubtype={Short Refereed Conference},
abstract={The PowerTAC competition has gained attention for being a realistic and powerful simulation platform used for research on retail engergy markets, in part because of the growing number of energy markets worldwide. Agetns in this complex environment typically use multiple strategies, changing from one to another, posing a problem for current learning algorithms. This paper introduces DriftER, an algorithm that learns an opponent model and tracks its error rate. We compare our algorithm in the PowerTAC simulator against the champion of the 2013 competition and a state of the art algorithm tailored for interacting against switching (non-stationary) opponents. The results show that DriftER outperforms the competition in terms of profit and accuracy.},
}

• Robert Loftin, Bei Peng, James MacGlashan, Michael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Journal of Autonomous Agents and Multi-Agent Systems, pages 1-30, 2015.

For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy.

@article{2015AAMAS-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Michael L. Littman and Matthew E. Taylor and Jeff Huang and David L. Roberts},
title={{Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning}},
journal={{Journal of Autonomous Agents and Multi-Agent Systems}},
pages={1--30},
year={2015},
doi={10.1007/s10458-015-9283-7},
publisher={Springer},
abstract={ For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy. },
}

• Bei Peng, Robert Loftin, James MacGlashan, Michael L. Littman, Matthew E. Taylor, and David L. Roberts. Language and Policy Learning from Human-delivered Feedback. In Proceedings of the Machine Learning for Social Robotics workshop (at ICRA), May 2015.

Using rewards and punishments is a common and familiar paradigm for humans to train intelligent agents. Most existing learning algorithms in this paradigm follow a framework in which human feedback is treated as a numerical signal to be maximized by the agent. However, treating feedback as a numeric signal fails to capitalize on implied information the human trainer conveys with a lack of explicit feedback. For example, a trainer may withhold reward to signal to the agent a failure, or they may withhold punishment to signal that the agent is behaving correctly. We review our progress to date with Strategy-aware Bayesian Learning, which is able to learn from experience the ways trainers use feedback, and can exploit that knowledge to accelerate learning. Our work covers contextual bandits, goal-directed sequential decision-making tasks, and natural language command learning. We present a user study design to identify how users’ feedback strategies are affected by properties of the environment and agent competency for natural language command learning in sequential decision making tasks, which will inform the development of more adaptive models of human feedback in the future.

@inproceedings{2015ICRA-Peng,
author={Bei Peng and Robert Loftin and James MacGlashan and Michael L. Littman and Matthew E. Taylor and David L. Roberts},
title={{Language and Policy Learning from Human-delivered Feedback}},
booktitle={{Proceedings of the Machine Learning for Social Robotics workshop (at {ICRA})}},
month={May},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Using rewards and punishments is a common and familiar paradigm for humans to train intelligent agents. Most existing learning algorithms in this paradigm follow a framework in which human feedback is treated as a numerical signal to be maximized by the agent. However, treating feedback as a numeric signal fails to capitalize on implied information the human trainer conveys with a lack of explicit feedback. For example, a trainer may withhold reward to signal to the agent a failure, or they may withhold punishment to signal that the agent is behaving correctly. We review our progress to date with Strategy-aware Bayesian Learning, which is able to learn from experience the ways
trainers use feedback, and can exploit that knowledge to accelerate learning. Our work covers contextual bandits, goal-directed sequential decision-making tasks, and natural language command learning. We present a user study design to identify how users’ feedback strategies are affected by properties of the environment and agent competency for natural language command learning in sequential decision making tasks, which will inform the development of more adaptive models of human feedback in the future.}
}

• Mitchell Scott, Bei Peng, Madeline Chili, Tanay Nigam, Francis Pascual, Cynthia Matuszek, and Matthew E. Taylor. On the Ability to Provide Demonstrations on a UAS: Observing 90 Untrained Participants Abusing a Flying Robot. In Proceedings of the AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI), November 2015.

This paper presents an exploratory study where participants piloted a commercial UAS (unmanned aerial system) through an obstacle course. The goal was to determine how varying the instructions given to participants affected their performance. Preliminary data suggests future studies to perform, as well as guidelines for human-robot interaction, and some best practices for learning from demonstration studies.

@inproceedings{2015AI_HRI-Scott,
author={Mitchell Scott and Bei Peng and Madeline Chili and Tanay Nigam and Francis Pascual and Cynthia Matuszek and Matthew E. Taylor},
title={{On the Ability to Provide Demonstrations on a UAS: Observing 90 Untrained Participants Abusing a Flying Robot}},
booktitle={{Proceedings of the {AAAI} Fall Symposium on Artificial Intelligence and Human-Robot Interaction ({AI-HRI})}},
month={November},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper presents an exploratory study where participants piloted a commercial UAS (unmanned aerial system) through an obstacle course. The goal was to determine how varying the instructions given to participants affected their performance. Preliminary data suggests future studies to perform, as well as guidelines for human-robot interaction, and some best practices for learning from demonstration studies.}
}

• Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova. Reward Shaping by Demonstration. In The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2015.

Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, under certain conditions this method guarantees the convergence of the final policy, regardless of the properties of the potential function. However, this flexibility of choice may cause confusion when making a design decision for a specific domain, as the number of possible candidates for a potential function can be overwhelming. Moreover, the potential function either can be manually designed, to bias the behavior of the learner, or can be recovered from prior knowledge, e.g. from human demonstrations. In this paper we investigate the efficacy of two different methods of using a potential function recovered from human demonstrations. Our first approach uses a mixture of Gaussian distributions generated by samples collected during demonstrations (Gaussian-Shaping), and the second approach uses a reward function recovered from demonstrations with Relative Entropy Inverse Reinforcement Learning (RE-IRL-Shaping). We present our findings in Cart-Pole, Mountain Car, and Puddle World domains. Our results show that Gaussian-Shaping can provide an efficient reward heuristic, accelerating learning through its ability to capture local information, and RE-IRL-Shaping can be more resilient to bad demonstrations. We report a brief analysis of our findings and we aim to provide a future reference for reinforcement learning agent designers who consider using reward shaping by human demonstrations.

@inproceedings{2015RLDM-Suay,
author={Halit Bener Suay and Tim Brys and Matthew E. Taylor and Sonia Chernova},
title={{Reward Shaping by Demonstration}},
booktitle={{The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ({RLDM})}},
year={2015},
bib2html_rescat={Reinforcement Learning, Reward Shaping, Learning from Demonstration},
abstract={Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, under certain conditions this method guarantees the convergence of the final policy, regardless of the properties of the potential function. However, this flexibility of choice may cause confusion when making a design decision for a specific domain, as the number of possible candidates for a potential function can be overwhelming. Moreover, the potential function either can be manually designed, to bias the behavior of the learner, or can be recovered from prior knowledge, e.g. from human demonstrations. In this paper we investigate the efficacy of two different methods of using a potential function recovered from human demonstrations. Our first approach uses a mixture of Gaussian distributions generated by samples collected during demonstrations (Gaussian-Shaping), and the second approach uses a reward function recovered from demonstrations with Relative Entropy Inverse Reinforcement Learning (RE-IRL-Shaping). We present our findings in Cart-Pole, Mountain Car, and Puddle World domains. Our results show that Gaussian-Shaping can provide an efficient reward heuristic, accelerating learning through its ability to capture local information, and RE-IRL-Shaping can be more resilient to bad demonstrations. We report a brief analysis of our findings and we aim to provide a future reference for reinforcement learning agent designers who consider using reward shaping by human demonstrations.}
}

• Yusen Zhan and Matthew E. Taylor. Online Transfer Learning in Reinforcement Learning Domains. In Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), November 2015.

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

@inproceedings{2015SDMIA-Zhan,
author={Yusen Zhan and Matthew E. Taylor},
title={{Online Transfer Learning in Reinforcement Learning Domains}},
booktitle={{Proceedings of the {AAAI} Fall Symposium on Sequential Decision Making for Intelligent Agents ({SDMIA})}},
month={November},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.}
}

• Yawei Zhang, Yunxiang Ye, Zhaodong Wang, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. Intelligent In-Orchard Bin-Managing System for Tree Fruit Production. In Proceedings of the Robotics in Agriculture workshop (at ICRA), May 2015.

The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.

@inproceedings{2015ICRA-Zhang,
author={Yawei Zhang and Yunxiang Ye and Zhaodong Wang and Matthew E. Taylor and Geoffrey A. Hollinger and Qin Zhang},
title={{Intelligent In-Orchard Bin-Managing System for Tree Fruit Production}},
booktitle={{Proceedings of the Robotics in Agriculture workshop (at {ICRA})}},
month={May},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.}
}

### 2014

• Haitham Bou Ammar, Eric Eaton, Matthew E. Taylor, Decibal C. Mocanu, Kurt Driessens, Gerhard Weiss, and Karl Tuyls. An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning. In Proceedings of the Machine Learning for Interactive Systems workshop (at AAAI), July 2014.
@inproceedings(2014MLIS-BouAmmar,
title={{An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning}},
author={Haitham Bou Ammar and Eric Eaton and Matthew E. Taylor and Decibal C. Mocanu and Kurt Driessens and Gerhard Weiss and Karl Tuyls},
booktitle={{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI})}},
month={July},
year={2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning}
)

• Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. Online Multi-Task Learning for Policy Gradient Methods. In Proceedings of the 31st International Conferences on Machine Learning (ICML), June 2014. 25% acceptance rate
@inproceedings{2014ICML-BouAmmar,
author={Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew E. Taylor},
title={{Online Multi-Task Learning for Policy Gradient Methods}},
booktitle={{Proceedings of the 31st International Conferences on Machine Learning ({ICML})}},
note={25% acceptance rate},
month={June},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
}

• Tim Brys, Matthew E. Taylor, and Ann Nowé. Using Ensemble Techniques and Multi-Objectivization to Solve Reinforcement Learning Problems. In Proceedings of the 21st European Conference on Artificial Intelligence (ECAI), August 2014. 41% acceptance rate for short papers
@inproceedings{2014ECAI-Brys,
author={Tim Brys and Matthew E. Taylor and Ann Now\'{e}},
title={{Using Ensemble Techniques and Multi-Objectivization to Solve Reinforcement Learning Problems}},
booktitle={{Proceedings of the 21st European Conference on Artificial Intelligence ({ECAI})}},
month={August},
year={2014},
note={41% acceptance rate for short papers},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Tim Brys, Anna Harutyunyan, Peter Vrancx, Matthew E. Taylor, Daniel Kudenko, and Ann Nowé. Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping. In Proceedings of the IEEE 2014 International Joint Conference on Neural Networks (IJCNN), July 2014. 59% acceptance rate
@inproceedings{2014IJCNN-Brys,
author={Tim Brys and Anna Harutyunyan and Peter Vrancx and Matthew E. Taylor and Daniel Kudenko and Ann Now\'{e}},
title={{Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping}},
booktitle={{Proceedings of the {IEEE} 2014 International Joint Conference on Neural Networks ({IJCNN})}},
month={July},
year={2014},
note={59% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Tim Brys, Kristof Van Moffaert, Ann Nowe, and Matthew E. Taylor. Adaptive Objective Selection for Correlated Objectives in Multi-Objective Reinforcement Learning (Extended Abstract). In The 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2014. Extended abstract: 24% acceptance rate for papers, additional 22% for extended abstracts
@inproceedings{2014AAMAS-Brys,
author={Tim Brys and Kristof Van Moffaert and Ann Nowe and Matthew E. Taylor},
title={{Adaptive Objective Selection for Correlated Objectives in Multi-Objective Reinforcement Learning (Extended Abstract)}},
booktitle={{The 13th International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2014},
note={Extended abstract: 24% acceptance rate for papers, additional 22% for extended abstracts},
bib2html_rescat={Reinforcement Learning},
bib2html_pubtype={Short Refereed Conference},
}

• Tim Brys, Ann Nowé, Daniel Kudenko, and Matthew E. Taylor. Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
@inproceedings{2014AAAI-Brys,
author={Tim Brys and Ann Now\'{e} and Daniel Kudenko and Matthew E. Taylor},
title={{Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence}},
booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={July},
year={2014},
note={28% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Tim Brys, Tong T. Pham, and Matthew E. Taylor. Distributed learning and multi-objectivity in traffic light control. Connection Science, 26(1):65-83, 2014.

Traffic jams and suboptimal traffic flows are ubiquitous in modern societies, and they create enormous economic losses each year. Delays at traffic lights alone account for roughly 10\% of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning (RL) approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Distributed constraint optimisation approaches (DCOP) have also been shown to be successful, but are limited to cases where the traffic flows are known. The distributed coordination of exploration and exploitation (DCEE) framework was recently proposed to introduce learning in the DCOP framework. In this paper, we present a study of DCEE and RL techniques in a complex simulator, illustrating the particular advantages of each, comparing them against standard isolated traffic actuated signals. We analyse how learning and coordination behave under different traffic conditions, and discuss the multi-objective nature of the problem. Finally we evaluate several alternative reward signals in the best performing approach, some of these taking advantage of the correlation between the problem-inherent objectives to improve performance.

@article{2014ConnectionScience-Brys,
author={Tim Brys and Tong T. Pham and Matthew E. Taylor},
title={{Distributed learning and multi-objectivity in traffic light control}},
journal={{Connection Science}},
volume={26},
number={1},
pages={65-83},
year={2014},
doi={10.1080/09540091.2014.885282},
url={http://dx.doi.org/10.1080/09540091.2014.885282},
eprint={http://dx.doi.org/10.1080/09540091.2014.885282},
abstract={ Traffic jams and suboptimal traffic flows are ubiquitous in modern societies, and they create enormous economic losses each year. Delays at traffic lights alone account for roughly 10\% of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning (RL) approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Distributed constraint optimisation approaches (DCOP) have also been shown to be successful, but are limited to cases where the traffic flows are known. The distributed coordination of exploration and exploitation (DCEE) framework was recently proposed to introduce learning in the DCOP framework. In this paper, we present a study of DCEE and RL techniques in a complex simulator, illustrating the particular advantages of each, comparing them against standard isolated traffic actuated signals. We analyse how learning and coordination behave under different traffic conditions, and discuss the multi-objective nature of the problem. Finally we evaluate several alternative reward signals in the best performing approach, some of these taking advantage of the correlation between the problem-inherent objectives to improve performance. },
bib2html_pubtype={Journal Article},
bib2html_rescat={Reinforcement Learning, DCOP}
}

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. An Autonomous Transfer Learning Algorithm for TD-Learners. In Proceedings of the 8th Hellenic Conference on Artificial Intelligence (SETN), May 2014. 50% acceptance rate
@inproceedings{2014SETN-Fachantidis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{An Autonomous Transfer Learning Algorithm for TD-Learners}},
booktitle={{Proceedings of the 8th Hellenic Conference on Artificial Intelligence ({SETN})}},
note={50% acceptance rate},
month={May},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
}

• Chris HolmesParker, Matthew E. Taylor, Adrian Agogino, and Kagan Tumer. CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract). In The Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems, May 2014. Extended abstract: 24% acceptance rate for papers, additional 22% for extended abstracts
@inproceedings{2014AAMAS-HolmesParker,
author={Chris HolmesParker and Matthew E. Taylor and Adrian Agogino and Kagan Tumer},
title={{CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract)}},
booktitle={{The Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems}},
month={May},
year={2014},
note={Extended abstract: 24% acceptance rate for papers, additional 22% for extended abstracts},
bib2html_rescat={Reinforcement Learning},
bib2html_pubtype={Short Refereed Conference},
}

• Chris HolmesParker, Matthew E. Taylor, Yusen Zhan, and Kagan Tumer. Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
@inproceedings(2014ALA-HolmesParker,
author={Chris HolmesParker and Matthew E. Taylor and Yusen Zhan and Kagan Tumer},
title={{Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
month={May},
year= {2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
)

• Chris HolmesParker, Matthew E. Taylor, Adrian Agogino, and Kagan Tumer. CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning. In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT), August 2014. 43% acceptance rate
@inproceedings{2014IAT-HolmesParker,
author={Chris HolmesParker and Matthew E. Taylor and Adrian Agogino and Kagan Tumer},
title={{{CLEAN}ing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning}},
booktitle={{Proceedings of the 2014 {IEEE/WIC/ACM} International Conference on Intelligent Agent Technology ({IAT})}},
month={August},
year={2014},
note={43% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Robert Loftin, Bei Peng, James MacGlashan, Machiael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
@inproceedings{2014AAAI-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Machiael L. Littman and Matthew E. Taylor and Jeff Huang and David L. Roberts},
title={{A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback}},
booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={July},
year={2014},
note={28% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Robert Loftin, Bei Peng, James MacGlashan, Michael Littman, Matthew E. Taylor, David Roberts, and Jeff Huang. Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), August 2014.
@inproceedings{2014ROMAN-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Michael Littman and Matthew E. Taylor and David Roberts and Jeff Huang},
title={{Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies}},
booktitle={{Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication ({RO-MAN})}},
month={August},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• James Macglashan, Michael L. Littman, Robert Loftin, Bei Peng, David Roberts, and Matthew E. Taylor. Training an Agent to Ground Commands with Reward and Punishment. In Proceedings of the Machine Learning for Interactive Systems workshop (at AAAI), July 2014.
@inproceedings(2014MLIS-James,
title={{Training an Agent to Ground Commands with Reward and Punishment}},
author={James Macglashan and Michael L. Littman and Robert Loftin and Bei Peng and David Roberts and Matthew E. Taylor},
booktitle={{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI})}},
month={July},
year={2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning}
)

• Matthew E. Taylor, Nicholas Carboni, Anestis Fachantidis, Ioannis Vlahavas, and Lisa Torrey. Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1):45-63, 2014.

This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

@article{2014ConnectionScience-Taylor,
author={Matthew E. Taylor and Nicholas Carboni and Anestis Fachantidis and Ioannis Vlahavas and Lisa Torrey},
title={{Reinforcement learning agents providing advice in complex video games}},
journal={{Connection Science}},
volume={26},
number={1},
pages={45-63},
year={2014},
doi={10.1080/09540091.2014.885279},
url={http://dx.doi.org/10.1080/09540091.2014.885279},
eprint={http://dx.doi.org/10.1080/09540091.2014.885279},
abstract={ This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. {Proceedings of the international conference on autonomous agents and multiagent systems}] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. {Proceedings of the adaptive and learning agents workshop (at AAMAS-13)}]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations. },
bib2html_pubtype={Journal Article},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
}

• Matthew E. Taylor and Lisa Torrey. Agents Teaching Agents in Reinforcement Learning (Nectar Abstract). In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), September 2014. Nectar Track, 45% acceptance rate
@inproceedings{2014ECML-Taylor,
author={Matthew E. Taylor and Lisa Torrey},
title={{Agents Teaching Agents in Reinforcement Learning (Nectar Abstract)}},
booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)}},
month={September},
year={2014},
note={Nectar Track, 45% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Yusen Zhan, Anestis Fachantidis, Ioannis Vlahavas, and Matthew E. Taylor. Agents Teaching Humans in Reinforcement Learning Tasks. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
@inproceedings(2014ALA-Zhan,
author={Yusen Zhan and Anestis Fachantidis and Ioannis Vlahavas and Matthew E. Taylor},
title={{Agents Teaching Humans in Reinforcement Learning Tasks}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
month={May},
year= {2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
)

### 2013

• Haitham Bou Ammar, Matthew E. Taylor, Karl Tuyls, and Gerhard Weiss. Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping. In LNAI Post-proceedings of the European Workshop on Multi-agent Systems. Springer-Verlag, 2013.
@inproceedings(LNAI13-Amar,
author={Haitham Bou Ammar and Matthew E. Taylor and Karl Tuyls and Gerhard Weiss},
title={{Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping}},
booktitle={{LNAI Post-proceedings of the European Workshop on Multi-agent Systems}},
year={2013},
publisher={Springer-Verlag},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_pubtype={Refereed Book Chapter},
)

• Haitham Bou Ammar, Decebal Constantin Mocanu, Matthew E. Taylor, Kurt Driessens, Karl Tuyls, and Gerhard Weiss. Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines. In The 25th Benelux Conference on Artificial Intelligence (BNAIC), November 2013.
@inproceedings{BNAIC13-BouAamar,
author={Haitham Bou Ammar and Decebal Constantin Mocanu and Matthew E. Taylor and Kurt Driessens and Karl Tuyls and Gerhard Weiss},
title={{Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines}},
booktitle={{The 25th Benelux Conference on Artificial Intelligence ({BNAIC})}},
month={November},
year={2013},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Short Refereed Conference},
}

• Haitham Bou Ammar, Decebal Constantin Mocanu, Matthew E. Taylor, Kurt Driessens, Karl Tuyls, and Gerhard Weiss. Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), September 2013. 25% acceptance rate
@inproceedings{ECML13-BouAamar,
author={Haitham Bou Ammar and Decebal Constantin Mocanu and Matthew E. Taylor and Kurt Driessens and Karl Tuyls and Gerhard Weiss},
title={{Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines}},
booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ({ECML PKDD})}},
month={September},
year = {2013},
note = {25% acceptance rate},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
}

• Ravi Balasubramanian and Matthew E. Taylor. Learning for Mobile-Robot Error Recovery (Extended Abstract). In The AAAI 2013 Spring Symposium — Designing Intelligent Robots: Reintegrating AI II, March 2013.
@inproceedings(AAAI13Symp-Balasubramanian,
author={Ravi Balasubramanian and Matthew E. Taylor},
title={{Learning for Mobile-Robot Error Recovery (Extended Abstract)}},
booktitle={{The {AAAI} 2013 Spring Symposium --- Designing Intelligent Robots: Reintegrating {AI} {II}}},
month={March},
year= {2013},
wwwnote={<a href="http://people.csail.mit.edu/gdk/dir2/">Designing Intelligent Robots</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Robotics},
)

• Nicholas Carboni and Matthew E. Taylor. Preliminary Results for 1 vs.~1 Tactics in Starcraft. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), May 2013.

This paper describes the development and analysis of two algorithms designed to allow one agent, the teacher, to give advice to another agent, the student. These algorithms contribute to a family of algorithms designed to allow teaching with limited advice. We compare the ability of the student to learn using reinforcement learning with and without such advice. Experiments are conducted in the Starcraft domain, a challenging but appropriate domain for this type of research. Our results show that the time at which advice is given has a significant effect on the result of student learning and that agents with the best performance in a task may not always be the most effective teachers.

@inproceedings(ALA13-Carboni,
author={Nicholas Carboni and Matthew E. Taylor},
title={{Preliminary Results for 1 vs.~1 Tactics in Starcraft}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={May},
year= {2013},
wwwnote={<a href="http://swarmlab.unimaas.nl/ala2013/">ALA-13</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
abstract={This paper describes the development and analysis of two algorithms designed to allow one agent, the teacher, to give advice to another agent, the student. These algorithms contribute to a family of algorithms designed to allow teaching with limited advice. We compare the ability of the student to learn using reinforcement learning with and without such advice. Experiments are conducted
in the Starcraft domain, a challenging but appropriate domain for this type of research. Our results show that the time at which advice is given has a significant effect on the result of student learning and that agents with the best performance in a task may not always be the most effective teachers.},
)

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract). In The AAAI 2013 Spring Symposium — Lifelong Machine Learning, March 2013.
@inproceedings(AAAI13-Anestis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract)}},
booktitle={{The {AAAI} 2013 Spring Symposium --- Lifelong Machine Learning}},
month={March},
year= {2013},
wwwnote={<a href="http://cs.brynmawr.edu/~eeaton/AAAI-SSS13-LML/">Lifelong Machine Learning</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Robotics},
)

• Tong Pham, Tim Brys, and Matthew E. Taylor. Learning Coordinated Traffic Light Control. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), May 2013.

Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Some distributed constraint optimization approaches have also been used, but focus on cases where the traffic flows are known. This paper presents a preliminary comparison between these two classes of optimization methods in a complex simulator, with the goal of eventually producing real-time algorithms that could be deployed in real-world situations.

@inproceedings(ALA13-Pham,
author={Tong Pham and Tim Brys and Matthew E. Taylor},
title={{Learning Coordinated Traffic Light Control}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={May},
year= {2013},
wwwnote={<a href="http://swarmlab.unimaas.nl/ala2013/">ALA-13</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning,DCOP},
abstract={Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Some distributed constraint optimization approaches have also been used, but focus on cases where the traffic flows are known. This paper presents a preliminary comparison between these two classes of optimization methods in a complex simulator, with the goal of eventually producing real-time algorithms that could be deployed in real-world situations.},
)

• Tong Pham, Aly Tawfika, and Matthew E. Taylor. A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment. In Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations, September 2013.
@inproceedings{abm13-Pham,
author="Tong Pham and Aly Tawfika and Matthew E. Taylor",
title={{A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment}},
booktitle={{Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations}},
month="September",
year = {2013},
bib2html_rescat = {DCOP},
bib2html_pubtype = {Refereed Conference},
}

• Lisa Torrey and Matthew E. Taylor. Teaching on a Budget: Agents Advising Agents in Reinforcement Learning. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013. 23% acceptance rate

This paper introduces a teacher-student framework for reinforcement learning. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two experimental domains: Mountain Car and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

@inproceedings{AAMAS13-Torrey,
author="Lisa Torrey and Matthew E. Taylor",
title={{Teaching on a Budget: Agents Advising Agents in Reinforcement Learning}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2013},
note = {23% acceptance rate},
wwwnote = {<a href="aamas2013.cs.umn.edu/">AAMAS-13</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
abstract = "This paper introduces a teacher-student framework for reinforcement
learning. In this framework, a teacher agent instructs a student
agent by suggesting actions the student should take as it learns.
However, the teacher may only give such advice a limited number
of times. We present several novel algorithms that teachers can
use to budget their advice effectively, and we evaluate them in two
experimental domains: Mountain Car and Pac-Man. Our results
show that the same amount of advice, given at different moments,
can have different effects on student learning, and that teachers can
significantly affect student learning even when students use different
learning methods and state representations.",
}

### 2012

• Matthew Adams, Robert Loftin, Matthew E. Taylor, Michael Littman, and David Roberts. An Empirical Analysis of RL’s Drift From Its Behaviorism Roots. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), June 2012.

We present an empirical survey of reinforcement learning techniques and relate these techniques to concepts from behaviorism, a field of psychology concerned with the learning process. Specifically, we examine two standard RL algorithms, model-free SARSA, and model-based R-MAX, when used with various shaping techniques. We consider multiple techniques for incorporating shaping into these algorithms, including the use of options and potentialbased shaping. Findings indicate any improvement in sample complexity that results from shaping is limited at best. We suggest that this is either due to reinforcement learning not modeling behaviorism well, or behaviorism not modeling animal learning well. We further suggest that a paradigm shift in reinforcement learning techniques is required before the kind of learning performance that techniques from behaviorism indicate are possible can be realized.

@inproceedings(ALA12-Adams,
author={Matthew Adams and Robert Loftin and Matthew E. Taylor and Michael Littman and David Roberts},
title={{An Empirical Analysis of {RL}'s Drift From Its Behaviorism Roots}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={June},
year={2012},
wwwnote={<a href="http://como.vub.ac.be/ALA2012/">ALA-12</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
abstract={We present an empirical survey of reinforcement learning techniques and relate these techniques to concepts from behaviorism, a field of psychology concerned with the learning process. Specifically, we examine two standard RL algorithms, model-free SARSA, and model-based R-MAX, when used with various shaping techniques. We consider multiple techniques for incorporating shaping into these algorithms, including the use of options and potentialbased shaping. Findings indicate any improvement in sample complexity that results from shaping is limited at best. We suggest that this is either due to reinforcement learning not modeling behaviorism well, or behaviorism not modeling animal learning well. We further suggest that a paradigm shift in reinforcement learning techniques is required before the kind of learning performance that techniques from behaviorism indicate are possible can be realized.},
)

• Haitham Bou Ammar, Karl Tuyls, Matthew E. Taylor, Kurt Driessen, and Gerhard Weiss. Reinforcement Learning Transfer via Sparse Coding. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), June 2012. 20% acceptance rate

Although reinforcement learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (TrLSPI and TrFQI) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of environmental samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.

@inproceedings{12AAMAS-Haitham,
author="Haitham Bou Ammar and Karl Tuyls and Matthew E. Taylor and Kurt Driessen and Gerhard Weiss",
title={{Reinforcement Learning Transfer via Sparse Coding}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="June",
year = {2012},
note = {20% acceptance rate},
wwwnote = {<a href="http://aamas2012.webs.upv.es">AAMAS-12</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
abstract = "Although reinforcement learning (RL) has been successfully deployed
in a variety of tasks, learning speed remains a fundamental
problem for applying RL in complex environments. Transfer learning
aims to ameliorate this shortcoming by speeding up learning
through the adaptation of previously learned behaviors in similar
tasks. Transfer techniques often use an inter-task mapping, which
determines how a pair of tasks are related. Instead of relying on a
hand-coded inter-task mapping, this paper proposes a novel transfer
learning method capable of autonomously creating an inter-task
mapping by using a novel combination of sparse coding, sparse
projection learning and sparse Gaussian processes. We also propose
two new transfer algorithms (TrLSPI and TrFQI) based on
least squares policy iteration and fitted-Q-iteration. Experiments
not only show successful transfer of information between similar
tasks, inverted pendulum to cart pole, but also between two very
different domains: mountain car to cart pole. This paper empirically
shows that the learned inter-task mapping can be successfully
used to (1) improve the performance of a learned policy on a fixed
number of environmental samples, (2) reduce the learning times
needed by the algorithms to converge to a policy on a fixed number
of samples, and (3) converge faster to a near-optimal policy given
a large number of samples.",
}

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. Transfer Learning via Multiple Inter-Task Mappings. In Scott Sanner and Marcus Hutter, editors, Recent Advances in Reinforcement Learning, volume 7188 of Lecture Notes in Artificial Intelligence, pages 225-236. Springer-Verlag, Berlin, 2012.
@incollection{LNAI11-Fachantidis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{Transfer Learning via Multiple Inter-Task Mappings}},
booktitle={{Recent Advances in Reinforcement Learning}},
editor={Scott Sanner and Marcus Hutter},
year={2012},
series={Lecture Notes in Artificial Intelligence},
volume={7188},
pages={225-236},
isbn={978-3-642-29945-2},
publisher={Springer-Verlag},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_pubtype={Refereed Book Chapter},
}

• Sanjeev Sharma and Matthew E. Taylor. Autonomous Waypoint Generation Strategy for On-Line Navigation in Unknown Environments. In IROS Workshop on Robot Motion Planning: Online, Reactive, and in Real-Time, October 2012.
@INPROCEEDINGS{IROSWS12-Sharma,
author={Sanjeev Sharma and Matthew E. Taylor},
title={{Autonomous Waypoint Generation Strategy for On-Line Navigation in Unknown Environments}},
booktitle={{{IROS} Workshop on Robot Motion Planning: Online, Reactive, and in Real-Time}},
year={2012},
month={October},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Robotics},
}

• Lisa Torrey and Matthew E. Taylor. Help an Agent Out: Student/Teacher Learning in Sequential Decision Tasks. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), June 2012.

Research on agents has led to the development of algorithms for learning from experience, accepting guidance from humans, and imitating experts. This paper explores a new direction for agents: the ability to teach other agents. In particular, we focus on situations where the teacher has limited expertise and instructs the student through action advice. The paper proposes and evaluates several teaching algorithms based on providing advice at a gradually decreasing rate. A crucial component of these algorithms is the ability of an agent to estimate its confidence in a state. We also contribute a student/teacher framework for implementing teaching strategies, which we hope will spur additional development in this relatively unexplored area.

@inproceedings(ALA12-Torrey,
author={Lisa Torrey and Matthew E. Taylor},
title={{Help an Agent Out: Student/Teacher Learning in Sequential Decision Tasks}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={June},
year={2012},
wwwnote={<a href="http://como.vub.ac.be/ALA2012/">ALA-12</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
abstract={Research on agents has led to the development of algorithms for learning from experience, accepting guidance from humans, and imitating experts. This paper explores a new direction for agents: the ability to teach other agents. In particular, we focus on situations where the teacher has limited expertise and instructs the student through action advice. The paper proposes and evaluates several teaching algorithms based on providing advice at a gradually decreasing rate. A crucial component of these algorithms is the ability of an agent to estimate its confidence in a state. We also contribute a student/teacher framework for implementing teaching strategies, which we hope will spur additional development in this relatively unexplored area.},
)

• Lisa Torrey and Matthew E. Taylor. Towards Student/Teacher Learning in Sequential Decision Tasks. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), June 2012. Extended Abstract: 20% acceptance rate for papers, additional 23% for extended abstracts
@inproceedings{12AAMAS-Torrey,
author={Lisa Torrey and Matthew E. Taylor},
title={{Towards Student/Teacher Learning in Sequential Decision Tasks}},
booktitle={{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={June},
year={2012},
note={Extended Abstract: 20% acceptance rate for papers, additional 23% for extended abstracts},
wwwnote={<a href="http://aamas2012.webs.upv.es">AAMAS-12</a>},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Short Refereed Conference},
}

### 2011

• Marcos A.~M.~Vieira, Matthew E. Taylor, Prateek Tandon, Manish Jain, Ramesh Govindan, Gaurav S.~Sukhatme, and Milind Tambe. Mitigating Multi-path Fading in a Mobile Mesh Network. Ad Hoc Networks Journal, 2011.
@article{ADHOC11-Vieira,
author={Marcos A.~M.~Vieira and Matthew E. Taylor and Prateek Tandon and Manish Jain and Ramesh Govindan and Gaurav S.~Sukhatme and Milind Tambe},
title={{Mitigating Multi-path Fading in a Mobile Mesh Network}},
journal={{Ad Hoc Networks Journal}},
year={2011},
bib2html_pubtype={Journal Article},
bib2html_rescat={DCOP}
}

• Scott Alfeld, Kumera Berkele, Stephen A. Desalvo, Tong Pham, Daniel Russo, Lisa Yan, and Matthew E. Taylor. Reducing the team uncertainty penalty: empirical and theoretical approaches. In Proceedings of the workshop on multiagent sequential decision making in uncertain domains (aamas), May 2011.
@inproceedings(MSDM11-Alfeld,
author="Scott Alfeld and Kumera Berkele and Stephen A. Desalvo and Tong Pham and Daniel Russo and Lisa Yan and Matthew E. Taylor",
title="Reducing the Team Uncertainty Penalty: Empirical and Theoretical Approaches",
booktitle="Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains (AAMAS)",
month="May",
year= "2011",
wwwnote={<a href="http://teamcore.usc.edu/junyounk/msdm2011/">MSDM-11</a>},
bib2html_pubtype = {Refereed Workshop or Symposium},
bib2html_rescat = {DCOP},
)

• Haitham Bou Ammar, Matthew E. Taylor, Karl Tuyls, and Gerhard Weiss. Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping. In Proceedings of the European Workshop on Multi-agent Systems, November 2011.
@inproceedings(EUMASS11-Amar,
author={Haitham Bou Ammar and Matthew E. Taylor and Karl Tuyls and Gerhard Weiss},
title={{Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping}},
booktitle={{Proceedings of the European Workshop on Multi-agent Systems}},
month={November},
year={2011},
wwwnote={<a href="http://swarmlab.unimaas.nl/eumas2011/">EUMAS-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
)

• Haitham Bou Ammar, Matthew E. Taylor, and Karl Tuyls. Common Sub-Space Transfer for Reinforcement Learning Tasks (Poster). In The 23rd Benelux Conference on Artificial Intelligence (BNAIC), November 2011. 44% overall acceptance rate
[BibTeX]
@inproceedings{11BNAIC-Ammar,
author={Haitham Bou Ammar and Matthew E. Taylor and Karl Tuyls},
title={{Common Sub-Space Transfer for Reinforcement Learning Tasks (Poster)}},
booktitle={{The 23rd Benelux Conference on Artificial Intelligence ({BNAIC})}},
month={November},
year={2011},
note={44% overall acceptance rate},
wwwnote={<a href="http://allserv.kahosl.be/bnaic2011/">BNAIC-11</a>},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Short Refereed Conference},
}

• Haitham Bou Ammar and Matthew E. Taylor. Common subspace transfer for reinforcement learning tasks. In Proceedings of the adaptive and learning agents workshop (aamas), May 2011.
@inproceedings(ALA11-Ammar,
author="Haitham Bou Ammar and Matthew E. Taylor",
title="Common Subspace Transfer for Reinforcement Learning Tasks",
booktitle="Proceedings of the Adaptive and Learning Agents workshop (AAMAS)",
month="May",
year= "2011",
wwwnote={<a href="http://como.vub.ac.be/ALA2011/">ALA-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
)

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. Transfer Learning via Multiple Inter-Task Mappings. In Proceedings of European Workshop on Reinforcement Learning (ECML), September 2011.
@inproceedings{EWRL11-Fachantidis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{Transfer Learning via Multiple Inter-Task Mappings}},
booktitle={{Proceedings of European Workshop on Reinforcement Learning ({ECML})}},
month = {September},
year={2011},
wwwnote={<a href="http://http://ewrl.wordpress.com/ewrl9-2011/">EWRL-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
}

• Bradley W. Knox, Matthew E. Taylor, and Peter Stone. Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report. In Proceedings of the Agents Learning Interactively from Human Teachers workshop (IJCAI), July 2011.
@inproceedings{ALIHT11-Knox,
author={W. Bradley Knox and Matthew E. Taylor and Peter Stone},
title={{Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report}},
booktitle={{Proceedings of the Agents Learning Interactively from Human Teachers workshop ({IJCAI})}},
month={July},
year={2011},
bib2html_pubtype={Refereed Workshop or Symposium},
}

• Jun-young Kwak, Zhengyu Yin, Rong Yang, Matthew E. Taylor, and Milind Tambe. Robust Execution-time Coordination in DEC-POMDPs Under Model Uncertainty. In Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains (AAMAS), May 2011.
@inproceedings(MSDM11-Kwak,
author={Jun-young Kwak and Zhengyu Yin and Rong Yang and Matthew E. Taylor and Milind Tambe},
title={{Robust Execution-time Coordination in {DEC-POMDPs} Under Model Uncertainty}},
booktitle={{Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains ({AAMAS})}},
month={May},
year={2011},
wwwnote={<a href="http://teamcore.usc.edu/junyounk/msdm2011/">MSDM-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Distributed POMDPs},
)

• Jun-young Kwak, Rong Yang, Zhengyu Yin, Matthew E. Taylor, and Milind Tambe. Towards Addressing Model Uncertainty: Robust Execution-time Coordination for Teamwork (Short Paper). In The IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT), August 2011. Short Paper: 21% acceptance rate for papers, additional 28% for short papers

Despite their worst-case NEXP-complete planning complexity, DEC-POMDPs remain a popular framework for multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) as a novel challenge for DEC-POMDPs and presents MODERN, the first executioncentric framework for DEC-POMDPs explicitly motivated by addressing such model uncertainty. MODERN’s shift of coordination reasoning from planning-time to execution-time avoids the high cost of computing optimal plans whose promised quality may not be realized in practice. There are three key ideas in MODERN: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computationtime and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication to key trigger points; and (iii) it limits its decision-theoretic reasoning about communication to trigger points and uses a systematic markup to encourage extra communication at these points – thus reducing uncertainty among team members at trigger points. We empirically show that MODERN is substantially faster than existing DEC-POMDP execution-centric methods while achieving significantly higher reward.

@inproceedings{11IAT-Kwak,
author={Jun-young Kwak and Rong Yang and Zhengyu Yin and Matthew E. Taylor and Milind Tambe},
title={{Towards Addressing Model Uncertainty: Robust Execution-time Coordination for Teamwork (Short Paper)}},
booktitle={{The {IEEE/WIC/ACM} International Conference on Intelligent Agent Technology ({IAT})}},
month={August},
year={2011},
note={Short Paper: 21% acceptance rate for papers, additional 28% for short papers},
wwwnote={<a href="http://liris.cnrs.fr/~wi-iat11/IAT_2011/">IAT-11</a>},
bib2html_rescat={Distributed POMDPs},
bib2html_pubtype={Short Refereed Conference},
abstract={Despite their worst-case NEXP-complete planning complexity, DEC-POMDPs remain a popular framework for multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) as a novel challenge for DEC-POMDPs and presents MODERN, the first executioncentric framework for DEC-POMDPs explicitly motivated by addressing such model uncertainty. MODERN's shift of coordination reasoning from planning-time to execution-time avoids the high cost of computing optimal plans whose promised quality may not be realized in practice. There are three key ideas in MODERN: (i) it maintains an exponentially smaller model of other agents' beliefs and actions than in previous work and then further reduces the computationtime and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication to key trigger points; and (iii) it limits its decision-theoretic reasoning about communication to trigger points and uses a systematic markup to encourage extra communication at these points - thus reducing uncertainty among team members at trigger points. We empirically show that MODERN is substantially faster than existing DEC-POMDP execution-centric methods while achieving significantly higher reward.},
}

• Jun-young Kwak, Rong Yang, Zhengyu Yin, Matthew E. Taylor, and Milind Tambe. Teamwork in Distributed POMDPs: Execution-time Coordination Under Model Uncertainty (Poster). In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. Extended Abstract: 22% acceptance rate for papers, additional 25% for extended abstracts
@inproceedings{11AAMAS-Kwak,
author={Jun-young Kwak and Rong Yang and Zhengyu Yin and Matthew E. Taylor and Milind Tambe},
title={{Teamwork in Distributed {POMDP}s: Execution-time Coordination Under Model Uncertainty (Poster)}},
booktitle={{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2011},
note={Extended Abstract: 22% acceptance rate for papers, additional 25% for extended abstracts},
wwwnote={<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_rescat={Distributed POMDPs},
bib2html_pubtype={Short Refereed Conference},
}

• Paul Scerri, Balajee Kannan, Pras Velagapudi, Kate Macarthur, Peter Stone, Matthew E. Taylor, John Dolan, Alessandro Farinelli, Archie Chapman, Bernadine Dias, and George Kantor. Flood Disaster Mitigation: A Real-world Challenge Problem for Multi-Agent Unmanned Surface Vehicles. In Proceedings of the Autonomous Robots and Multirobot Systems workshop (AAMAS), May 2011.

As we advance the state of technology for robotic systems, there is a need for defining complex real-world challenge problems for the multi-agent/robot community to address. A well-defined challenge problem can motivate researchers to aggressively address and overcome core domain challenges that might otherwise take years to solve. As the focus of multi-agent research shifts from the mature domains of UGV and UAVs to USVs, there is a need for outlining well-defined and realistic challenge problems. In this position paper, we define one such problem, food disaster mitigation. The ability to respond quickly and effectively to disasters is essential to saving lives and limiting the scope of damage. The nature of floods dictates the need for a fleet of low-cost and small autonomous boats that can provide situational awareness (SA), damage assessment and deliver supplies before more traditional emergency response assets can access an affected area. In addition to addressing an essential need, the outlined application provides an interesting challenge problem for advancing fundamental research in multi-agent systems (MAS) specific to the USV domain. In this paper, we define a technical statement of this MAS challenge problem based and outline MAS specific technical constraints based on the associated real-world constraints. Core MAS sub-problems that must be solved for this application include coordination, control, human interaction, autonomy, task allocation, and communication. This problem provides a concrete and real-world MAS application that will bring together researchers with a diverse range of expertise to develop and implement the necessary algorithms and mechanisms.

@inproceedings(ARMS11-Scerri,
author={Paul Scerri and Balajee Kannan and Pras Velagapudi and Kate Macarthur and Peter Stone and Matthew E. Taylor and John Dolan and Alessandro Farinelli and Archie Chapman and Bernadine Dias and George Kantor},
title={{Flood Disaster Mitigation: A Real-world Challenge Problem for Multi-Agent Unmanned Surface Vehicles}},
booktitle={{Proceedings of the Autonomous Robots and Multirobot Systems workshop ({AAMAS})}},
month={May},
year={2011},
abstract={As we advance the state of technology for robotic systems, there is a need for defining complex real-world challenge problems for the multi-agent/robot community to address. A well-defined challenge problem can motivate researchers to aggressively address and overcome core domain challenges that might otherwise take years to solve. As the focus of multi-agent research shifts from the mature domains of UGV and UAVs to USVs, there is a need for outlining well-defined and realistic challenge problems. In this position paper, we define one such problem, food disaster mitigation. The ability to respond quickly and effectively to disasters is essential to saving lives and limiting the scope of damage. The nature of floods dictates the need for a fleet of low-cost and small autonomous boats that can provide situational awareness (SA), damage assessment and deliver supplies before more traditional emergency response assets can access an affected area. In addition to addressing an essential need, the outlined application provides an interesting challenge problem for advancing fundamental research in multi-agent systems (MAS) specific to the USV domain. In this paper, we define a technical statement of this MAS challenge problem based and outline MAS specific technical constraints based on the associated real-world constraints. Core MAS sub-problems that must be solved for this application include coordination, control, human interaction, autonomy, task allocation, and communication. This problem provides a concrete and real-world MAS application that will bring together researchers with a diverse range of expertise to develop and implement the necessary algorithms and mechanisms.},
wwwnote={<a href="http://www.alg.ewi.tudelft.nl/arms2011/">ARMS-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
)

• Matthew E. Taylor, Brian Kulis, and Fei Sha. Metric Learning for Reinforcement Learning Agents. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
@inproceedings{11AAMAS-MetricLearn-Taylor,
author="Matthew E. Taylor and Brian Kulis and Fei Sha",
title = {{Metric Learning for Reinforcement Learning Agents}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2011},
note = {22% acceptance rate},
wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning},
}

• Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. Integrating Reinforcement Learning with Human Demonstrations of Varying Ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
@inproceedings{11AAMAS-HAT-Taylor,
author="Matthew E. Taylor and Halit Bener Suay and Sonia Chernova",
title = {{Integrating Reinforcement Learning with Human Demonstrations of Varying Ability}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2011},
note = {22% acceptance rate},
wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
}

• Matthew E. Taylor, Manish Jain, Christopher Kiekintveld, Jun-young Kwak, Rong Yang, Zhengyu Yin, and Milind Tambe. Two decades of multiagent teamwork research: past, present, and future. In C. Guttmann, F. Dignum, and M. Georgeff, editors, Collaborative agents – research and development (CARE) 2009-2010, volume 6066 of Lecture Notes in Artificial Intelligence. Springer-Verlag, 2011.
@incollection{11CARE-Taylor,
author={Matthew E. Taylor and Manish Jain and Christopher Kiekintveld and Jun-young Kwak and Rong Yang and Zhengyu Yin and Milind Tambe},
title={Two Decades of Multiagent Teamwork Research: Past, Present, and Future},
editor={C. Guttmann and F. Dignum and M. Georgeff},
booktitle={Collaborative Agents - REsearch and Development {(CARE)} 2009-2010},
publisher={Springer-Verlag},
series={Lecture Notes in Artificial Intelligence},
volume={6066},
year={2011},
bib2html_pubtype={Invited Book Chapter},
byb2html_rescat={DCOP},
}

• Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. Using Human Demonstrations to Improve Reinforcement Learning. In The AAAI 2011 Spring Symposium — Help Me Help You: Bridging the Gaps in Human-Agent Collaboration, March 2011.

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration.

@inproceedings(AAAI11Symp-Taylor,
author={Matthew E. Taylor and Halit Bener Suay and Sonia Chernova},
title={{Using Human Demonstrations to Improve Reinforcement Learning}},
booktitle={{The {AAAI} 2011 Spring Symposium --- Help Me Help You: Bridging the Gaps in Human-Agent Collaboration}},
month={March},
year={2011},
wwwnote={<a href="www.isi.edu/~maheswar/hmhy2011.html">HMHY2011</a>},
abstract={This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance
in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning
significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies
the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration. },
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
)

• Matthew E. Taylor. Teaching Reinforcement Learning with Mario: An Argument and Case Study. In Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence, August 2011.
@inproceedings(EAAI11-Taylor,
author={Matthew E. Taylor},
title={{Teaching Reinforcement Learning with Mario: An Argument and Case Study}},
booktitle={{Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence}},
month={August},
year={2011},
wwwnote={<a href="eaai.stanford.edu">EAAI-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Pedagogy},
)

• Matthew E. Taylor, Manish Jain, Prateek Tandon, Makoto Yokoo, and Milind Tambe. Distributed On-line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation. Advances in Complex Systems, 2011.
@article{ACS11-Taylor,
author={Matthew E. Taylor and Manish Jain and Prateek Tandon and Makoto Yokoo and Milind Tambe},
title={{Distributed On-line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation}},
journal={{Advances in Complex Systems}},
year={2011},
bib2html_pubtype={Journal Article},
bib2html_rescat={DCOP}
}

• Matthew E. Taylor, Christopher Kiekintveld, and Milind Tambe. Evaluating Deployed Decision Support Systems for Security: Challenges, Arguments, and Approaches. In Milind Tambe, editor, Security Games: Theory, Deployed Applications, Lessons Learned, pages 254-283. Cambridge University Press, 2011.
@incollection(11Evaluation-Taylor,
author={Matthew E. Taylor and Christopher Kiekintveld and Milind Tambe},
title={{Evaluating Deployed Decision Support Systems for Security: Challenges, Arguments, and Approaches}},
editor={Milind Tambe},
booktitle={{Security Games: Theory, Deployed Applications, Lessons Learned}},
publisher={Cambridge University Press},
year={2011},
pages={254-283},
isbn={978-1-107-09642-4},
bib2html_pubtype={Invited Book Chapter},
bib2html_rescat={Security},
)

• Matthew E. Taylor. Model Assignment: Reinforcement Learning in a Generalized Mario Domain. In Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence, August 2011.
@inproceedings(EAAI11-ModelAssignment,
author={Matthew E. Taylor},
title={{Model Assignment: Reinforcement Learning in a Generalized Mario Domain}},
booktitle={{Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence}},
month={August},
year={2011},
wwwnote={<a href="eaai.stanford.edu">EAAI-11</a><br><a href="http://www.cs.lafayette.edu/~taylorm/11EAAI/index.html">Assignment Webpage</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Pedagogy},
)

• Matthew E. Taylor and Peter Stone. An Introduction to Inter-task Transfer for Reinforcement Learning. AI Magazine, 32(1):15-34, 2011.
@article{AAAIMag11-Taylor,
author={Matthew E. Taylor and Peter Stone},
title={{An Introduction to Inter-task Transfer for Reinforcement Learning}},
journal={{{AI} Magazine}},
year={2011},
volume={32},
number={1},
pages={15--34},
bib2html_pubtype={Journal Article},
bib2html_rescat={Reinforcement Learning, Transfer Learning}
}

• Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein, Gal Kaminka, Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. Taylor, Xuezhi Wang, Avishay Zilka, and Milind Tambe. ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
@inproceedings{11AAMAS-Tsai,
author={Jason Tsai and Natalie Fridman and Emma Bowring and Matthew Brown and Shira Epstein and Gal Kaminka and Stacy Marsella and Andrew Ogden and Inbal Rika and Ankur Sheel and Matthew E. Taylor and {Xuezhi Wang} and Avishay Zilka and Milind Tambe},
title = {{ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2011},
note = {22% acceptance rate},
wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_pubtype = {Refereed Conference},
}

• Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning. In Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), April 2011.
@inproceedings(ADPRL11-Whiteson,
author={Shimon Whiteson and Brian Tanner and Matthew E. Taylor and Peter Stone},
title={{Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning}},
booktitle={{Proceedings of the {IEEE} Symposium on Adaptive Dynamic Programming and Reinforcement Learning ({ADPRL})}},
month={April},
year={2011},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
bib2html_funding={Reinforcement Learning},
)

### 2010

• Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe. Towards a Theoretic Understanding of DCEE. In Proceedings of the Distributed Constraint Reasoning workshop (AAMAS), May 2010.

Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phenomenon has been shown empirically in simulation, but the underlying mathematics are not yet understood. By understanding the mathematics, we could develop algorithms that reduce or eliminate this penalty of increased teamwork. <br> In this paper we investigate the team uncertainty penalty on two fronts. First, we provide results of robots exhibiting the same behavior seen in simulations. Second, we present a mathematical foundation by which to analyze the phenomenon. Using this model, we present findings indicating that the team uncertainty penalty is inherent to the level of teamwork allowed, rather than to specific algorithms.

@inproceedings(DCR10-Alfeld,
author={Scott Alfeld and Matthew E. Taylor and Prateek Tandon and Milind Tambe},
title={{Towards a Theoretic Understanding of {DCEE}}},
booktitle={{Proceedings of the Distributed Constraint Reasoning workshop ({AAMAS})}},
month={May},
year={2010},
wwwnote={<a href="https://www.cs.drexel.edu/dcr2010">DCR-10</a>},
abstract={Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phenomenon has been shown empirically in simulation, but the underlying mathematics are not yet understood. By understanding the mathematics, we could develop algorithms that reduce or eliminate this penalty of increased teamwork. <br> In this paper we investigate the team uncertainty penalty on two fronts. First, we provide results of robots exhibiting the same behavior seen in simulations. Second, we present a mathematical foundation by which to analyze the phenomenon. Using this model, we present findings indicating that the team uncertainty penalty is inherent to the level of teamwork allowed, rather than to specific algorithms.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={DCOP, Robotics},
)

• Samuel Barrett, Matthew E. Taylor, and Peter Stone. Transfer Learning for Reinforcement Learning on a Physical Robot. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), May 2010.

As robots become more widely available, many capabilities that were once only practical to develop and test in simulation are becoming feasible on real, physically grounded, robots. This newfound feasibility is important because simulators rarely represent the world with sufficient fidelity that developed behaviors will work as desired in the real world. However, development and testing on robots remains difficult and time consuming, so it is desirable to minimize the number of trials needed when developing robot behaviors. <br> This paper focuses on reinforcement learning (RL) on physically grounded robots. A few noteworthy exceptions notwithstanding, RL has typically been done purely in simulation, or, at best, initially in simulation with the eventual learned behaviors run on a real robot. However, some recent RL methods exhibit sufficiently low sample complexity to enable learning entirely on robots. One such method is transfer learning for RL. The main contribution of this paper is the first empirical demonstration that transfer learning can significantly speed up and even improve asymptotic performance of RL done entirely on a physical robot. In addition, we show that transferring information learned in simulation can bolster additional learning on the robot.

@inproceedings(ALA10-Barrett,
author={Samuel Barrett and Matthew E. Taylor and Peter Stone},
title={{Transfer Learning for Reinforcement Learning on a Physical Robot}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={May},
year={2010},
wwwnote={<a href="http://www-users.cs.york.ac.uk/~grzes/ala10/">ALA-10</a>},
abstract={ As robots become more widely available, many capabilities that were once only practical to develop and test in simulation are becoming feasible on real, physically grounded, robots. This newfound feasibility is important because simulators rarely represent the world with sufficient fidelity that developed behaviors will work as desired in the real world. However, development and testing on robots remains difficult and time consuming, so it is desirable to minimize the number of trials needed when developing robot behaviors. <br> This paper focuses on reinforcement learning (RL) on physically grounded robots. A few noteworthy exceptions notwithstanding, RL has typically been done purely in simulation, or, at best, initially in simulation with the eventual learned behaviors run on a real robot. However, some recent RL methods exhibit sufficiently low sample complexity to enable learning entirely on robots. One such method is transfer learning for RL. The main contribution of this paper is the first empirical demonstration that transfer learning can significantly speed up and even improve asymptotic performance of RL done entirely on a physical robot. In addition, we show that transferring information learned in simulation can bolster additional learning on the robot.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Robotics},
)

• Marc Ponsen, Matthew E. Taylor, and Karl Tuyls. Abstraction and Generalization in Reinforcement Learning. In Matthew E. Taylor and Karl Tuyls, editors, Adaptive Agents and Multi-Agent Systems IV, volume 5924, pages 1-33. Springer-Verlag, 2010.
@incollection(Ponsen10,
author={Marc Ponsen and Matthew E. Taylor and Karl Tuyls},
title={{Abstraction and Generalization in Reinforcement Learning}},
booktitle={{Adaptive Agents and Multi-Agent Systems {IV}}},
editor={Matthew E. Taylor and Karl Tuyls},
publisher={Springer-Verlag},
year={2010},
pages={1--33},
volume={5924},
bib2html_pubtype={Invited Book Chapter},
bib2html_rescat={Reinforcement Learning},
)

• Matthew E. Taylor and Sonia Chernova. Integrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer. In Proceedings of the Agents Learning Interactively from Human Teachers workshop (AAMAS), May 2010.

This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations can be transferred into a baseline policy for an agent, and reinforcement learning can be used to significantly improve policy performance. These results are an important initial step that suggest that agents can not only quickly learn to mimic human actions, but that they can also learn to surpass the abilities of the teacher.

@inproceedings(ALIHT10-Taylor,
author={Matthew E. Taylor and Sonia Chernova},
title={{Integrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer}},
booktitle={{Proceedings of the Agents Learning Interactively from Human Teachers workshop ({AAMAS})}},
month={May},
year={2010},
abstract={ This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations can be transferred into a baseline policy for an agent, and reinforcement learning can be used to significantly improve policy performance. These results are an important initial step that suggest that agents can not only quickly learn to mimic human actions, but that they can also learn to surpass the abilities of the teacher.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
)

• Matthew E. Taylor, Christopher Kiekintveld, Craig Western, and Milind Tambe. A Framework for Evaluating Deployed Security Systems: Is There a Chink in your ARMOR?. Informatica, 34(2):129-139, 2010.

A growing number of security applications are being developed and deployed to explicitly reduce risk from adversaries’ actions. However, there are many challenges when attempting to \emph{evaluate} such systems, both in the lab and in the real world. Traditional evaluations used by computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant. The primary contribution of this paper is to provide a preliminary framework which can guide the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps to determine what evaluations could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in evaluating deployed applications, focusing on those in security domains.

@article{Informatica10-Taylor,
author={Matthew E. Taylor and Christopher Kiekintveld and Craig Western and Milind Tambe},
title={{A Framework for Evaluating Deployed Security Systems: Is There a Chink in your {ARMOR}?}},
journal={{Informatica}},
year={2010},
volume={34},
number={2},
pages={129--139},
abstract={A growing number of security applications are being developed and deployed to explicitly reduce risk from adversaries' actions. However, there are many challenges when attempting to \emph{evaluate} such systems, both in the lab and in the real world. Traditional evaluations used by computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant. The primary contribution of this paper is to provide a preliminary framework which can guide the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps to determine what evaluations could, and should, be run in order to measure a system's overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in evaluating deployed applications, focusing on those in security domains.},
bib2html_pubtype={Journal Article},
bib2html_rescat={Security},
bib2html_funding={CREATE}
}

• Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Bertrand A. Maher, Doug Burger, and Kathryn S. McKinley. Evolving Compiler Heuristics to Manage Communication and Contention. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI), July 2010. Nectar Track, 25% acceptance rate

As computer architectures become increasingly complex, hand-tuning compiler heuristics becomes increasingly tedious and time consuming for compiler developers. This paper presents a case study that uses a genetic algorithm to learn a compiler policy. The target policy implicitly balances communication and contention among processing elements of the TRIPS processor, a physically realized prototype chip. We learn specialized policies for individual programs as well as general policies that work well across all programs. We also employ a two-stage method that first classifies the code being compiled based on salient characteristics, and then chooses a specialized policy based on that classification. <br> This work is particularly interesting for the AI community because it 1 emphasizes the need for increased collaboration between AI researchers and researchers from other branches of computer science and 2 discusses a machine learning setup where training on the custom hardware requires weeks of training, rather than the more typical minutes or hours.

@inproceedings(AAAI10-Nectar-taylor,
author="Matthew E. Taylor and Katherine E. Coons and Behnam Robatmili and Bertrand A. Maher and Doug Burger and Kathryn S. McKinley",
title={{Evolving Compiler Heuristics to Manage Communication and Contention}},
note = "Nectar Track, 25% acceptance rate",
booktitle={{Proceedings of the Twenty-Fourth Conference on Artificial Intelligence ({AAAI})}},
month="July",year="2010",
abstract="
As computer architectures become increasingly complex, hand-tuning
compiler heuristics becomes increasingly tedious and time consuming
for compiler developers. This paper presents a case study that uses a
genetic algorithm to learn a compiler policy. The target policy
implicitly balances communication and contention among processing
elements of the TRIPS processor, a physically realized prototype chip.
We learn specialized policies for individual programs as well as
general policies that work well across all programs. We also employ a
two-stage method that first classifies the code being compiled based
on salient characteristics, and then chooses a specialized policy
based on that classification.
<br>
This work is particularly interesting for the AI community because it
1 emphasizes the need for increased collaboration between AI
researchers and researchers from other branches of computer science
and 2 discusses a machine learning setup where training on the custom
hardware requires weeks of training, rather than the more typical
minutes or hours.",
wwwnote={<a href="http://www.aaai.org/Conferences/AAAI/aaai10.php">AAAI-2010</a>. This paper is based on results presented in our earlier <a href="b2hd-PACT08-coons.html">PACT-08 paper</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Genetic Algorithms},
bib2html_funding = {NSF, DARPA}
)

• Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, and Milind Tambe. When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2010. 24% acceptance rate

Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimization problem where agents must cooperate to balance exploration and exploitation. Surprisingly, results show that increased teamwork can hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertainty penalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in our problem setting.

@inproceedings{AAMAS10-Taylor,
author = {Matthew E. Taylor and Manish Jain and Yanquin Jin and Makoto Yooko and Milind Tambe},
title = {{When Should There be a Me'' in Team''? {D}istributed Multi-Agent Optimization Under Uncertainty}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2010},
note = {24% acceptance rate},
wwwnote = {<a href="http://www.cse.yorku.ca/AAMAS2010/index.php>AAMAS-10</a>},
abstract={Increasing teamwork between agents typically increases the
performance of a multi-agent system, at the cost of increased
communication and higher computational complexity. This work examines
joint actions in the context of a multi-agent optimization problem
where agents must cooperate to balance exploration and
exploitation. Surprisingly, results show that increased teamwork can
hurt agent performance, even when communication and computation costs
are ignored, which we term the team uncertainty penalty. This paper
introduces the above phenomena, analyzes it, and presents algorithms
to reduce the effect of the penalty in our problem setting.},
wwwnote={Supplemental material is available at <a href="http://teamcore.usc.edu/dcop/">http://teamcore.usc.edu/dcop/</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {DCOP},
}

• Shimon Whiteson, Matthew E. Taylor, and Peter Stone. Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning. Journal of Autonomous Agents and Multi-Agent Systems, 21(1):1-27, 2010.

Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods’ relative performance: 1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa’s learning updates are not reliable in the absence of the Markov property and 2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.

@article{JAAMAS09-Whiteson,
author={Shimon Whiteson and Matthew E. Taylor and Peter Stone},
title={{Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning}},
journal={{Journal of Autonomous Agents and Multi-Agent Systems}},
year={2010},
volume={21},
number={1},
pages={1--27},
abstract={Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods' relative performance: 1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa's learning updates are not reliable in the absence of the Markov property and 2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.},
bib2html_pubtype={Journal Article},
bib2html_funding={},
bib2html_rescat={Reinforcement Learning, Machine Learning in Practice}
}

### 2009

• Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. In Proceedings of the Third International Workshop on Agent Technology for Sensor Networks (AAMAS), May 2009.
[BibTeX] [Abstract]

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

@inproceedings(ATSN09-Jain,
author={Manish Jain and Matthew E. Taylor and Makoto Yokoo and Milind Tambe},
title={{{DCOP}s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
booktitle={{Proceedings of the Third International Workshop on Agent Technology for Sensor Networks ({AAMAS})}},
month={May},
year= {2009},
wwwnote={<a href="http://www.atsn09.org">ATSN-2009</a><br>Superseded by the IJCAI-09 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-IJCAI09-Jain.html">DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks</a>.},
abstract={Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={DCOP, Robotics},
bib2html_funding={DARPA}
)

• Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI), July 2009. 26% acceptance rate

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

@inproceedings(IJCAI09-Jain,
author="Manish Jain and Matthew E. Taylor and Makoto Yokoo and Milind Tambe",
title={{{DCOP}s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
booktitle={{Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence ({IJCAI})}},
month="July",
year= "2009",
note = {26% acceptance rate},
wwwnote={<a href="http://www.ijcai-09.org">IJCAI-2009</a>},
abstract={Buoyed by recent successes in the area of distributed
constraint optimization problems (DCOPs), this paper addresses
challenges faced when applying DCOPs to real-world domains. Three
fundamental challenges must be addressed for a class of real-world
domains, requiring novel DCOP algorithms. First, agents may not
know the payoff matrix and must explore the environment to
determine rewards associated with variable settings. Second,
agents may need to maximize total accumulated reward rather than
instantaneous final reward. Third, limited time horizons disallow
exhaustive exploration of the environment. We propose and
implement a set of novel algorithms that combine
decision-theoretic exploration approaches with DCOP-mandated
coordination. In addition to simulation results, we implement
these algorithms on robots, deploying DCOPs on a distributed
mobile sensor network.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {DCOP, Robotics},
bib2html_funding = {DARPA}
)

• Jun-young Kwak, Pradeep Varakantham, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping. In Proceedings of the Fourth Workshop on Multi-agent Sequential Decision-Making in Uncertain Domains (AAMAS), May 2009.

While distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, a novel algorithm to solve such distributed POMDPs. Two major novelties in TREMOR are (i) use of social model shaping to coordinate agents, (ii) harnessing efficient single agent-POMDP solvers. Experimental results demonstrate that TREMOR may provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

@inproceedings(MSDM09-Kwak,
author={Jun-young Kwak and Pradeep Varakantham and Matthew E. Taylor and Janusz Marecki and Paul Scerri and Milind Tambe},
title={{Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping}},
booktitle={{Proceedings of the Fourth Workshop on Multi-agent Sequential Decision-Making in Uncertain Domains ({AAMAS})}},
month={May},
year= {2009},
wwwnote={<a href="http://www.eecs.harvard.edu/~seuken/msdm2009/">MSDM-2009</a><br> Superseded by the ICAPS-09 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-ICAPS09-Varakantham.html">Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping</a>.},
abstract={While distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, a novel algorithm to solve such distributed POMDPs. Two major novelties in TREMOR are (i) use of social model shaping to coordinate agents, (ii) harnessing efficient single agent-POMDP solvers. Experimental results demonstrate that TREMOR may provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Distributed POMDPs},
bib2html_funding={ARMY}
)

• Matthew E. Taylor. Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design. In The AAAI 2009 Spring Symposium on Agents that Learn from Human Teachers, March 2009.

Transfer learning is a successful technique that significantly improves machine learning algorithms by training on a sequence of tasks rather than a single task in isolation. However, there is currently no systematic method for deciding how to construct such a sequence of tasks. In this paper, I propose that while humans are well-suited for the task of curriculum development, significant research is still necessary to better understand how to create effective curricula for machine learning algorithms.

@inproceedings(AAAI09SS-Taylor,
author={Matthew E. Taylor},
title={{Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design}},
booktitle={{The {AAAI} 2009 Spring Symposium on Agents that Learn from Human Teachers}},
month={March},
year={2009},
abstract={Transfer learning is a successful technique that significantly improves machine learning algorithms by training on a sequence of tasks rather than a single task in isolation. However, there is
currently no systematic method for deciding how to construct such a sequence of tasks. In this paper, I propose that while humans are well-suited for the task of curriculum development, significant research is still necessary to better understand how to create effective curricula for machine learning algorithms.},
wwwnote={<a href="http://www.cc.gatech.edu/AAAI-SS09-LFH/Home.html">AAAI 2009 Spring Symposium on Agents that Learn from Human Teachers},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_funding={}
)

• Matthew E. Taylor and Peter Stone. Categorizing Transfer for Reinforcement Learning. In Poster at the Multidisciplinary Symposium on Reinforcement Learning, June 2009.
@inproceedings{MSRL09-Taylor,
author={Matthew E. Taylor and Peter Stone},
title={{Categorizing Transfer for Reinforcement Learning}},
booktitle={{Poster at the Multidisciplinary Symposium on Reinforcement Learning}},
month={June},
year={2009},
wwwnote={<a href="http://msrl09.rl-community.org/">MSRL-09</a>.},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Refereed Workshop or Symposium},
}

• Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems. In Proceedings of the AAMAS-09 Workshop on Agent Design: Advancing from Practice to Theory, May 2009.

As multi-agent research transitions into the real world, evaluation becomes an increasingly important challenge. One can run controlled and repeatable tests in a laboratory environment, but such tests may be difficult, or even impossible, once the system is deployed. Furthermore, traditional metrics used by computer scientists, such as runtime analysis, may be largely irrelevant.

@inproceedings(ADAPT09-Taylor,
author={Matthew E. Taylor and Chris Kiekintveld and Craig Western and Milind Tambe},
title={{Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems}},
booktitle={{Proceedings of the {AAMAS}-09 Workshop on Agent Design: Advancing from Practice to Theory}},
month={May},
year={2009},
abstract={ As multi-agent research transitions into the real world, evaluation becomes an increasingly important challenge. One can run controlled and repeatable tests in a laboratory environment, but such tests may be difficult, or even impossible, once the system is deployed. Furthermore, traditional metrics used by computer scientists, such as runtime analysis, may be largely irrelevant.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Security},
bib2html_funding={CREATE}
)

• Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. Is There a Chink in Your ARMOR? Towards Robust Evaluations for Deployed Security Systems. In Proceedings of the IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications, July 2009.
@inproceedings(QRASA09-Taylor,
author={Matthew E. Taylor and Chris Kiekintveld and Craig Western and Milind Tambe},
title={{Is There a Chink in Your ARMOR? {T}owards Robust Evaluations for Deployed Security Systems}},
booktitle={{Proceedings of the {IJCAI} 2009 Workshop on Quantitative Risk Analysis for Security Applications}},
month={July},
year={2009},
wwwnote={<a href="http://teamcore.usc.edu/QRASA-09">QRASA-2009</a><br>Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-Informatica10-Taylor.html">A Framework for Evaluating Deployed Security Systems: Is There a Chink in your ARMOR?</a>.},
abstract={},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Security},
bib2html_funding={CREATE}
)

• Matthew E. Taylor and Peter Stone. Transfer Learning for Reinforcement Learning Domains: A Survey. Journal of Machine Learning Research, 10(1):1633-1685, 2009.

The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.

@article{JMLR09-taylor,
author={Matthew E. Taylor and Peter Stone},
title={{Transfer Learning for Reinforcement Learning Domains: A Survey}},
journal={{Journal of Machine Learning Research}},
volume={10},
number={1},
pages={1633--1685},
year={2009},
abstract={The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.},
bib2html_pubtype={Journal Article},
bib2html_funding={NSF, DARPA},
bib2html_rescat={Reinforcement Learning, Transfer Learning}
}

• Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains. In Proceedings of the IJCAI 2009 Workshop on Distributed Constraint Reasoning, July 2009.

Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically-motivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the \emph{Distributed Constraint Optimization Problem} framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.

@inproceedings(DCR09-Taylor,
author={Matthew E. Taylor and Manish Jain and Prateek Tandon and Milind Tambe},
title={{Using {DCOP}s to Balance Exploration and Exploitation in Time-Critical Domains}},
booktitle={{Proceedings of the {IJCAI} 2009 Workshop on Distributed Constraint Reasoning}},
month={July},
year={2009},
wwwnote={<a href="http://www-scf.usc.edu/~wyeoh/DCR09/">DCR-2009</a>},
abstract={Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent's decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically-motivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the \emph{Distributed Constraint Optimization Problem} framework and demonstrates when, and at what cost, increasing agents' coordination can improve the global reward on such problems.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={DCOP},
bib2html_funding={ARMY}
)

• Jason Tsai, Emma Bowring, Shira Epstein, Natalie Fridman, Prakhar Garg, Gal Kaminka, Andrew Ogden, Milind Tambe, and Matthew E. Taylor. Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport. In Proceedings of the Workshop on Emergency Management: Incident, Resource, and Supply Chain Management, November 2009.
@inproceedings(EMWS09-Tsai,
author={Jason Tsai and Emma Bowring and Shira Epstein and Natalie Fridman and Prakhar Garg and Gal Kaminka and Andrew Ogden and Milind Tambe and Matthew E. Taylor},
title={{Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport}},
booktitle={{Proceedings of the Workshop on Emergency Management: Incident, Resource, and Supply Chain Management}},
month={November},
year={2009},
wwwnote={<a href="http://www.ics.uci.edu/~projects/cert/EMWS09">EMWS09-2009</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Security},
)

• Pradeep Varakantham, Jun-young Kwak, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping. In Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling (ICAPS), September 2009. 34% acceptance rate

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

@inproceedings(ICAPS09-Varakantham,
author="Pradeep Varakantham and Jun-young Kwak and Matthew E. Taylor and Janusz Marecki and Paul Scerri and Milind Tambe",
title={{Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping}},
booktitle={{Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling ({ICAPS})}},
month="September",
year= "2009",
note = {34% acceptance rate},
wwwnote={<a href="http://icaps09.uom.gr">ICAPS-2009</a>},
abstract={ Distributed POMDPs provide an expressive framework for
modeling multiagent collaboration problems, but NEXP-Complete
complexity hinders their scalability and application in real-world
domains. This paper introduces a subclass of distributed POMDPs,
and TREMOR, an algorithm to solve such distributed POMDPs. The
primary novelty of TREMOR is that agents plan individually with a
single agent POMDP solver and use social model shaping to
implicitly coordinate with other agents. Experiments demonstrate
that TREMOR can provide solutions orders of magnitude faster than
existing algorithms while achieving comparable, or even superior,
solution quality.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Distributed POMDPs},
bib2html_funding = {ARMY}
)

• Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. Generalized Domains for Empirical Evaluations in Reinforcement Learning. In Proceedings of the Fourth Workshop on Evaluation Methods for Machine Learning at ICML-09, June 2009.

Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to method overfitting, wherein results may not generalize to similar environments. To address this problem, we advocate empirical evaluations using generalized domains: parameterized problem generators that explicitly encode variations in the environment to which the learner should be robust. We argue that evaluating across a set of these generated problems offers a more meaningful evaluation of reinforcement learning algorithms.

@inproceedings(ICMLWS09-Whiteson,
author={Shimon Whiteson and Brian Tanner and Matthew E. Taylor and Peter Stone},
title={{Generalized Domains for Empirical Evaluations in Reinforcement Learning}},
booktitle={{Proceedings of the Fourth Workshop on Evaluation Methods for Machine Learning at {ICML}-09}},
month={June},
year={2009},
wwwnote={<a href="http://www.site.uottawa.ca/ICML09WS/">Fourth annual workshop on Evaluation Methods for Machine Learning</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
bib2html_funding={},
abstract={Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to method overfitting, wherein results may not generalize to similar environments. To address this problem, we advocate empirical evaluations using generalized domains: parameterized problem generators that explicitly encode variations in the environment to which the learner should be robust. We argue that evaluating across a set of these generated problems offers a more meaningful evaluation of reinforcement learning algorithms.},
)

### 2008

• Katherine K. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Kathryn McKinley, and Doug Burger. Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning. In Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques (PACT), pages 32-42, October 2008. 19% acceptance rate

Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) processors, in which instructions communicate with one another directly on a distributed substrate, give the compiler control over communication overheads at a fine granularity. Prior work shows that compilers can effectively reduce fine-grained communication overheads in EDGE architectures using a spatial instruction placement algorithm with a heuristic-based cost function. While this algorithm is effective, the cost function must be painstakingly tuned. Heuristics tuned to perform well across a variety of applications leave users with little ability to tune performance-critical applications, yet we find that the best placement heuristics vary significantly with the application. <p> First, we suggest a systematic feature selection method that reduces the feature set size based on the extent to which features affect performance. To automatically discover placement heuristics, we then use these features as input to a reinforcement learning technique, called Neuro-Evolution of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve neural networks. We show that NEAT outperforms simulated annealing, the most commonly used optimization technique for instruction placement. We use NEAT to learn general heuristics that are as effective as hand-tuned heuristics, but we find that improving over highly hand-tuned general heuristics is difficult. We then suggest a hierarchical approach to machine learning that classifies segments of code with similar characteristics and learns heuristics for these classes. This approach performs closer to the specialized heuristics. Together, these results suggest that learning compiler heuristics may benefit from both improved feature selection and classification.

@inproceedings{PACT08-coons,
author="Katherine K. Coons and Behnam Robatmili and Matthew E. Taylor and Bertrand A. Maher and Kathryn McKinley and Doug Burger",
title={{Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning}},
booktitle={{Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques ({PACT})}},
month="October",
year="2008",
pages="32--42",
note = {19% acceptance rate},
wwwnote={<a href="http://www.eecg.toronto.edu/pact/">PACT-2008</a>},
abstract = {Communication overheads are one of the fundamental challenges in a
multiprocessor system. As the number of processors on a chip increases,
communication overheads and the distribution of computation and data
become increasingly important performance factors. Explicit Dataflow
Graph Execution (EDGE) processors, in which instructions communicate
with one another directly on a distributed substrate, give the compiler
control over communication overheads at a fine granularity. Prior work
shows that compilers can effectively reduce fine-grained communication
overheads in EDGE architectures using a spatial instruction placement
algorithm with a heuristic-based cost function. While this algorithm is
effective, the cost function must be painstakingly tuned. Heuristics tuned
to perform well across a variety of applications leave users with little
ability to tune performance-critical applications, yet we find that the
best placement heuristics vary significantly with the application.
<p>
First, we suggest a systematic feature selection method that reduces the
feature set size based on the extent to which features affect performance.
To automatically discover placement heuristics, we then use these features
as input to a reinforcement learning technique, called Neuro-Evolution
of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve
neural networks. We show that NEAT outperforms simulated annealing, the
most commonly used optimization technique for instruction placement. We
use NEAT to learn general heuristics that are as effective as hand-tuned
heuristics, but we find that improving over highly hand-tuned general
heuristics is difficult. We then suggest a hierarchical approach
to machine learning that classifies segments of code with similar
characteristics and learns heuristics for these classes. This approach
performs closer to the specialized heuristics. Together, these results
suggest that learning compiler heuristics may benefit from both improved
feature selection and classification.
},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Autonomic Computing, Machine Learning in Practice},
}

• Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. Transferring Instances for Model-Based Reinforcement Learning. In The Adaptive Learning Agents and Multi-Agent Systems (ALAMAS+ALAG) workshop at AAMAS, May 2008.
[BibTeX] [Abstract]

\emph{Reinforcement learning} agents typically require a significant amount of data before performing well on complex tasks. \emph{Transfer learning} methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample complexity and asymptotic performance of a model-based algorithm when learning in a continuous state space.

@inproceedings(AAMAS08-ALAMAS-Taylor,
author={Matthew E. Taylor and Nicholas K. Jong and Peter Stone},
title={{Transferring Instances for Model-Based Reinforcement Learning}},
booktitle={{The Adaptive Learning Agents and Multi-Agent Systems ({ALAMAS+ALAG}) workshop at {AAMAS}}},
month={May},
year={2008},
abstract = {\emph{Reinforcement learning} agents typically require a significant amount of data before performing well on complex tasks. \emph{Transfer learning} methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample complexity and asymptotic performance of a model-based algorithm when learning in a continuous state space.},
wwwnote={<a href="http://ki.informatik.uni-wuerzburg.de/~kluegl/ALAMAS.ALAg/">AAMAS 2008 workshop on Adaptive Learning Agents and Multi-Agent Systems</a><br> Superseded by the ECML-08 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-ECML08-Taylor.html">Transferring Instances for Model-Based Reinforcement Learning</a>.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Planning},
bib2html_funding={NSF, DARPA}
)

• Matthew E. Taylor. Autonomous Inter-Task Transfer in Reinforcement Learning Domains. PhD thesis, Department of Computer Sciences, The University of Texas at Austin, August 2008. Available as Technical Report UT-AI-TR-08-5.

@PhDThesis{Thesis-taylor,
author={Matthew E. Taylor},
title={{Autonomous Inter-Task Transfer in Reinforcement Learning Domains}},
school={Department of Computer Sciences, The University of Texas at Austin},
year={2008},
month={August},
abstract={Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. While these methods have had experimental successes and have been shown to exhibit some desirable properties in theory, the basic learning algorithms have often been found slow in practice. Therefore, much of the current RL research focuses on speeding up learning by taking advantage of domain knowledge, or by better utilizing agents' experience. The ambitious goal of transfer learning, when applied to RL tasks, is to accelerate learning on some target task after training on a different, but related, source task. This dissertation demonstrates that transfer learning methods can successfully improve learning in RL tasks via experience from previously learned tasks. Transfer learning can increase RL's applicability to difficult tasks by allowing agents to generalize their experience across learning problems.<br>
This dissertation presents inter-task mappings, the first transfer mechanism in this area to successfully enable transfer between tasks with different state variables and actions. Inter-task mappings have subsequently been used by a number of transfer researchers. A set of six transfer learning algorithms are then introduced. While these transfer methods differ in terms of what base RL algorithms they are compatible with, what type of knowledge they transfer, and what their strengths are, all utilize the same inter-task mapping mechanism. These transfer methods can all successfully use mappings constructed by a human from domain knowledge, but there may be situations in which domain knowledge is unavailable, or insufficient, to describe how two given tasks are related. We therefore also study how inter-task mappings can be learned autonomously by leveraging existing machine learning algorithms. Our methods use classification and regression techniques to successfully discover similarities between data gathered in pairs of tasks, culminating in what is currently one of the most robust mapping-learning algorithms for RL transfer.<br>
Combining transfer methods with these similarity-learning algorithms allows us to empirically demonstrate the plausibility of autonomous transfer. We fully implement these methods in four domains (each with different salient characteristics), show that transfer can significantly improve an agent's ability to learn in each domain, and explore the limits of transfer's applicability.",
note="Available as Technical Report UT-AI-TR-08-5.},
bib2html_pubtype={Dissertation},
bib2html_rescat={Reinforcement Learning, Transfer Learning}
}

• Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Autonomous Transfer for Reinforcement Learning. In Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 283-290, May 2008. 22% acceptance rate

Recent work in transfer learning has succeeded in making reinforcement learning algorithms more efficient by incorporating knowledge from previous tasks. However, such methods typically must be provided either a full model of the tasks or an explicit relation mapping one task into the other. An autonomous agent may not have access to such high-level information, but would be able to analyze its experience to find similarities between tasks. In this paper we introduce Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent’s experience. We empirically demonstrate that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks. Additionally, we demonstrate that our method may also assist with the difficult problem of task selection for transfer.

@inproceedings{AAMAS08-taylor,
author="Matthew E. Taylor and Gregory Kuhlmann and Peter Stone",
title={{Autonomous Transfer for Reinforcement Learning}},
booktitle={{Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year="2008",
pages="283--290",
abstract={Recent work in transfer learning has succeeded in
making reinforcement learning algorithms more
efficient by incorporating knowledge from previous
tasks. However, such methods typically must be
provided either a full model of the tasks or an
explicit relation mapping one task into the
other. An autonomous agent may not have access to
such high-level information, but would be able to
analyze its experience to find similarities between
tasks. In this paper we introduce Modeling
Approximate State Transitions by Exploiting
Regression (MASTER), a method for automatically
learning a mapping from one task to another through
an agent's experience. We empirically demonstrate
that such learned relationships can significantly
improve the speed of a reinforcement learning
algorithm in a series of Mountain Car
tasks. Additionally, we demonstrate that our method
may also assist with the difficult problem of task
selection for transfer.},
note = {22% acceptance rate},
wwwnote={<a href="http://gaips.inesc-id.pt/aamas2008/">AAMAS-2008</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {DARPA, NSF}
}

• Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. Transferring Instances for Model-Based Reinforcement Learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 488-505, September 2008. 19% acceptance rate

Recent work in transfer learning has succeeded in Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. Transfer learning methods have made progress reducing sample complexity, but they have primarily been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample efficiency and asymptotic performance of a model-based algorithm when learning in a continuous state space. Additionally, we conduct experiments to test the limits of TIMBREL’s effectiveness.

@inproceedings(ECML08-taylor,
author="Matthew E. Taylor and Nicholas K. Jong and Peter Stone",
title={{Transferring Instances for Model-Based Reinforcement Learning}},
booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ({ECML PKDD})}},
pages="488--505",
month="September",
year= "2008",
note = {19% acceptance rate},
wwwnote={<a href="http://www.ecmlpkdd2008.org/">ECML-2008</a>},
abstract={Recent work in transfer learning has succeeded in
Reinforcement learning agents typically require a significant
amount of data before performing well on complex tasks. Transfer
learning methods have made progress reducing sample complexity,
but they have primarily been applied to model-free learning
methods, not more data-efficient model-based learning
methods. This paper introduces TIMBREL, a novel method capable of
transferring information effectively into a model-based
reinforcement learning algorithm. We demonstrate that TIMBREL can
significantly improve the sample efficiency and asymptotic
performance of a model-based algorithm when learning in a
continuous state space. Additionally, we conduct experiments to
test the limits of TIMBREL's effectiveness.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning, Planning},
bib2html_funding = {NSF, DARPA}
)

• Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Transfer Learning and Intelligence: an Argument and Approach. In Proceedings of the First Conference on Artificial General Intelligence (AGI), March 2008. 50% acceptance rate

In order to claim fully general intelligence in an autonomous agent, the ability to learn is one of the most central capabilities. Classical machine learning techniques have had many significant empirical successes, but large real-world problems that are of interest to generally intelligent agents require learning much faster (with much less training experience) than is currently possible. This paper presents transfer learning, where knowledge from a learned task can be used to significantly speed up learning in a novel task, as the key to achieving the learning capabilities necessary for general intelligence. In addition to motivating the need for transfer learning in an intelligent agent, we introduce a novel method for selecting types of tasks to be used for transfer and empirically demonstrate that such a selection can lead to significant increases in training speed in a two-player game.

@inproceedings(AGI08-taylor,
author="Matthew E. Taylor and Gregory Kuhlmann and Peter Stone",
title={{Transfer Learning and Intelligence: an Argument and Approach}},
booktitle={{Proceedings of the First Conference on Artificial General Intelligence ({AGI})}},
month="March",
year="2008",
abstract="In order to claim fully general intelligence in an
autonomous agent, the ability to learn is one of the most
central capabilities. Classical machine learning techniques
have had many significant empirical successes, but large
real-world problems that are of interest to generally
intelligent agents require learning much faster (with much
less training experience) than is currently possible. This
paper presents transfer learning, where knowledge
from a learned task can be used to significantly speed up
learning in a novel task, as the key to achieving the
learning capabilities necessary for general intelligence. In
addition to motivating the need for transfer learning in an
intelligent agent, we introduce a novel method for selecting
types of tasks to be used for transfer and empirically
demonstrate that such a selection can lead to significant
increases in training speed in a two-player game.",
note = {50% acceptance rate},
wwwnote={<a href="http://agi-08.org/">AGI-2008</a><br> A video
of talk is available <a
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning},
bib2html_funding = {NSF, DARPA},
)

### 2007

• Mazda Ahmadi, Matthew E. Taylor, and Peter Stone. IFSA: Incremental Feature-Set Augmentation for Reinforcement Learning Tasks. In Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1120-1127, May 2007. 22% acceptance rate, Finalist for Best Student Paper

Reinforcement learning is a popular and successful framework for many agent-related problems because only limited environmental feedback is necessary for learning. While many algorithms exist to learn effective policies in such problems, learning is often used to solve real world problems, which typically have large state spaces, and therefore suffer from the “curse of dimensionality.” One effective method for speeding-up reinforcement learning algorithms is to leverage expert knowledge. In this paper, we propose a method for dynamically augmenting the agent’s feature set in order to speed up value-function-based reinforcement learning. The domain expert divides the feature set into a series of subsets such that a novel problem concept can be learned from each successive subset. Domain knowledge is also used to order the feature subsets in order of their importance for learning. Our algorithm uses the ordered feature subsets to learn tasks significantly faster than if the entire feature set is used from the start. Incremental Feature-Set Augmentation (IFSA) is fully implemented and tested in three different domains: Gridworld, Blackjack and RoboCup Soccer Keepaway. All experiments show that IFSA can significantly speed up learning and motivates the applicability of this novel RL method.

@inproceedings{AAMAS07-ahmadi,
author="Mazda Ahmadi and Matthew E. Taylor and Peter Stone",
title={{{IFSA}: Incremental Feature-Set Augmentation for Reinforcement Learning Tasks}},
booktitle={{Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
pages="1120--1127",
month="May",
year="2007",
abstract={
Reinforcement learning is a popular and successful framework for
many agent-related problems because only limited environmental
feedback is necessary for learning. While many algorithms exist to
learn effective policies in such problems, learning is often
used to solve real world problems, which typically have large state
spaces, and therefore suffer from the curse of dimensionality.''
One effective method for speeding-up reinforcement learning algorithms
is to leverage expert knowledge. In this paper, we propose a method
for dynamically augmenting the agent's feature set in order to
speed up value-function-based reinforcement learning. The domain
expert divides the feature set into a series of subsets such that a
novel problem concept can be learned from each successive
subset. Domain knowledge is also used to order the feature subsets in
order of their importance for learning. Our algorithm uses the
ordered feature subsets to learn tasks significantly faster than if
the entire feature set is used from the start. Incremental
Feature-Set Augmentation (IFSA) is fully implemented and tested in
three different domains: Gridworld, Blackjack and RoboCup Soccer
Keepaway. All experiments show that IFSA can significantly speed up
learning and motivates the applicability of this novel RL method.},
note = {22% acceptance rate, Finalist for Best Student Paper},
wwwnote={<span align="left" style="color: red; font-weight: bold">Best Student Paper Nomination</span> at <a href="http://www.aamas2007.nl/">AAMAS-2007</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning},
bib2html_funding = {DARPA, NSF,ONR},
}

• Matthew E. Taylor and Peter Stone. Towards Reinforcement Learning Representation Transfer (Poster). In The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 683-685, May 2007. Poster: 22% acceptance rate for talks, additional 25% for posters.
[BibTeX] [Abstract]

Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer methods are able to successfully transfer knowledge between agents in different reinforcement learning tasks, reducing the time needed to learn the target. However, the complimentary task of representation transfer, i.e.\ transferring knowledge between agents with different internal representations, has not been well explored. The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work introduces one such representation transfer algorithm which is implemented in a complex multiagent domain. Experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.

@inproceedings{AAMAS07-taylorRT,
author={Matthew E. Taylor and Peter Stone},
title={{Towards Reinforcement Learning Representation Transfer (Poster)}},
booktitle={{The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
pages={683--685},
month={May},
year={2007},
abstract={Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer methods are able to successfully transfer knowledge between agents in different reinforcement learning tasks, reducing the time needed to learn the target. However, the complimentary task of representation transfer, i.e.\ transferring knowledge between agents with different internal representations, has not been well explored. The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work introduces one such representation transfer algorithm which is implemented in a complex multiagent domain. Experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.},
note = "Poster: 22% acceptance rate for talks, additional 25% for posters.",
wwwnote={<a href="http://www.aamas2007.nl/">AAMAS-2007</a>. <br>Superseded by the symposium paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-AAAI07-Symposium.html">Representation Transfer for Reinforcement Learning</a>.},
bib2html_pubtype={Short Refereed Conference},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_funding={DARPA, NSF},
}

• Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Doug Burger, and Kathryn S. McKinley. Policy Search Optimization for Spatial Path Planning. In NIPS-07 workshop on Machine Learning for Systems Problems, December 2007. (Two page extended abstract.)
[BibTeX]
@inproceedings(NIPS07-taylor,
author={Matthew E. Taylor and Katherine E. Coons and Behnam Robatmili and Doug Burger and Kathryn S. McKinley},
title={{Policy Search Optimization for Spatial Path Planning}},
booktitle={{{NIPS}-07 workshop on Machine Learning for Systems Problems}},
month={December},
year={2007},
note={(Two page extended abstract.)},
wwwnote={<a href="http://radlab.cs.berkeley.edu/MLSys/">NIPS 2007 workshop on Machine Learning for Systems Problems</a><br> Superseded by the PACT-08 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-PACT08-Coons.html">Using Reinforcement Learning to Select Policy Features for Distributed Instruction Placement</a>.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Autonomic Computing, Machine Learning in Practice},
)

• Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Accelerating Search with Transferred Heuristics. In ICAPS-07 workshop on AI Planning and Learning, September 2007.

@inproceedings(ICAPS07WS-taylor,
author={Matthew E. Taylor and Gregory Kuhlmann and Peter Stone},
title={{Accelerating Search with Transferred Heuristics}},
booktitle={{{ICAPS}-07 workshop on AI Planning and Learning}},
month={September},
year={2007},
wwwnote={<a href="http://www.cs.umd.edu/users/ukuter/icaps07aipl/">ICAPS 2007 workshop on AI Planning and Learning</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Planning},
bib2html_funding={NSF, DARPA}
)

• Matthew E. Taylor and Peter Stone. Representation Transfer for Reinforcement Learning. In AAAI 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development, November 2007.

Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer learning methods are able to successfully transfer knowledge from a source reinforcement learning task into a target task, reducing learning time. However, the complimentary task of transferring knowledge between agents with different internal representations has not been well explored The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work defines representation transfer, contrasts it with task transfer, and introduces two novel algorithms. Additionally, we show representation transfer algorithms can also be successfully used for task transfer, providing an empirical connection between the two problems. These algorithms are fully implemented in a complex multiagent domain and experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.

@inproceedings(AAAI07-Symposium,
author={Matthew E. Taylor and Peter Stone},
title={{Representation Transfer for Reinforcement Learning}},
booktitle={{{AAAI} 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development}},
month={November},
year={2007},
abstract={Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer learning methods are able to successfully transfer knowledge from a source reinforcement learning task into a target task, reducing learning time. However, the complimentary task of transferring knowledge between agents with different internal representations has not been well explored The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work defines representation transfer, contrasts it with task transfer, and introduces two novel algorithms. Additionally, we show representation transfer algorithms can also be successfully used for task transfer, providing an empirical connection between the two problems. These algorithms are fully implemented in a complex multiagent domain and experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial. },
wwwnote={<a href="http://yertle.isi.edu/~clayton/aaai-fss07/index.php/Welcome">2007 AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_funding={DARPA, NSF},
)

• Matthew E. Taylor, Peter Stone, and Yaxin Liu. Transfer Learning via Inter-Task Mappings for Temporal Difference Learning. Journal of Machine Learning Research, 8(1):2125-2167, 2007.

@article{JMLR07-taylor,
author={Matthew E. Taylor and Peter Stone and Yaxin Liu},
title={{Transfer Learning via Inter-Task Mappings for Temporal Difference Learning}},
journal={{Journal of Machine Learning Research}},
year={2007},
volume={8},
number={1},
pages={2125--2167},
bib2html_pubtype={Journal Article},
bib2html_funding={NSF, DARPA},
bib2html_rescat={Reinforcement Learning, Transfer Learning}
}

• Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock. Autonomous Classification of Knowledge into an Ontology. In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS), May 2007. 52% acceptance rate

Ontologies are an increasingly important tool in knowledge representation, as they allow large amounts of data to be related in a logical fashion. Current research is concentrated on automatically constructing ontologies, merging ontologies with different structures, and optimal mechanisms for ontology building; in this work we consider the related, but distinct, problem of how to automatically determine where to place new knowledge into an existing ontology. Rather than relying on human knowledge engineers to carefully classify knowledge, it is becoming increasingly important for machine learning techniques to automate such a task. Automation is particularly important as the rate of ontology building via automatic knowledge acquisition techniques increases. This paper compares three well-established machine learning techniques and shows that they can be applied successfully to this knowledge placement task. Our methods are fully implemented and tested in the Cyc knowledge base system.

@inproceedings{FLAIRS07-taylor-ontology,
author="Matthew E. Taylor and Cynthia Matuszek and Bryan Klimt and Michael Witbrock",
title={{Autonomous Classification of Knowledge into an Ontology}},
booktitle={{Proceedings of the Twentieth International FLAIRS Conference ({FLAIRS})}},
month="May",
year="2007",
abstract="Ontologies are an increasingly important tool in
knowledge representation, as they allow large amounts of data
to be related in a logical fashion. Current research is
concentrated on automatically constructing ontologies, merging
ontologies with different structures, and optimal mechanisms
for ontology building; in this work we consider the related,
but distinct, problem of how to automatically determine where
to place new knowledge into an existing ontology. Rather than
relying on human knowledge engineers to carefully classify
knowledge, it is becoming increasingly important for machine
learning techniques to automate such a task. Automation is
particularly important as the rate of ontology building via
automatic knowledge acquisition techniques increases. This
paper compares three well-established machine learning
techniques and shows that they can be applied successfully to
this knowledge placement task. Our methods are fully
implemented and tested in the Cyc knowledge base system.",
note = {52% acceptance rate},
wwwnote={<a href="http://www.cise.ufl.edu/~ddd/FLAIRS/flairs2007/">FLAIRS-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Ontologies, Machine Learning in Practice},
bib2html_funding = {DARPA},
}

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison. In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI), pages 1675-1678, July 2007. Nectar Track, 38% acceptance rate

Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weaknesses are available. This paper summarizes a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. The results from this study help isolate the factors critical to the performance of each learning method and yield insights into their general strengths and weaknesses.

@inproceedings(AAAI07-taylor,
author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
title={{Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison}},
pages="1675--1678",
booktitle={{Proceedings of the Twenty-Second Conference on Artificial Intelligence ({AAAI})}},
month="July",
year="2007",
abstract="Reinforcement learning (RL) methods have become
popular in recent years because of their ability to solve
complex tasks with minimal feedback. Both genetic algorithms
(GAs) and temporal difference (TD) methods have proven
effective at solving difficult RL problems, but few rigorous
comparisons have been conducted. Thus, no general guidelines
describing the methods' relative strengths and weaknesses are
available. This paper summarizes a detailed empirical
comparison between a GA and a TD method in Keepaway, a
standard RL benchmark domain based on robot soccer. The
results from this study help isolate the factors critical to
the performance of each learning method and yield insights
into their general strengths and weaknesses.",
note = {Nectar Track, 38% acceptance rate},
wwwnote={<a href="http://www.aaai.org/Conferences/National/2007/aaai07.html">AAAI-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Genetic Algorithms},
bib2html_funding = {NSF, DARPA}
)

• Matthew E. Taylor and Peter Stone. Cross-Domain Transfer for Reinforcement Learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML), June 2007. 29% acceptance rate

A typical goal for transfer learning algorithms is to utilize knowledge gained in a source task to learn a target task faster. Recently introduced transfer methods in reinforcement learning settings have shown considerable promise, but they typically transfer between pairs of very similar tasks. This work introduces Rule Transfer, a transfer algorithm that first learns rules to summarize a source task policy and then leverages those rules to learn faster in a target task. This paper demonstrates that Rule Transfer can effectively speed up learning in Keepaway, a benchmark RL problem in the robot soccer domain, based on experience from source tasks in the gridworld domain. We empirically show, through the use of three distinct transfer metrics, that Rule Transfer is effective across these domains.

@inproceedings(ICML07-taylor,
author="Matthew E. Taylor and Peter Stone",
title={{Cross-Domain Transfer for Reinforcement Learning}},
booktitle={{Proceedings of the Twenty-Fourth International Conference on Machine Learning ({ICML})}},
month="June",
year="2007",
abstract="A typical goal for transfer learning algorithms is
to utilize knowledge gained in a source task to learn a
target task faster. Recently introduced transfer methods in
reinforcement learning settings have shown considerable
promise, but they typically transfer between pairs of very
similar tasks. This work introduces Rule Transfer, a
transfer algorithm that first learns rules to summarize a
source task policy and then leverages those rules to learn
faster in a target task. This paper demonstrates that Rule
Transfer can effectively speed up learning in Keepaway, a
benchmark RL problem in the robot soccer domain, based on
experience from source tasks in the gridworld domain. We
empirically show, through the use of three distinct transfer
metrics, that Rule Transfer is effective across these
domains.",
note = {29% acceptance rate},
wwwnote={<a href="http://oregonstate.edu/conferences/icml2007">ICML-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {NSF, DARPA} ,
)

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 156-163, May 2007. 22% acceptance rate

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods ({\sc tvitm-ps}) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that {\sc tvitm-ps} can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that {\sc tvitm-ps} still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

@inproceedings{AAMAS07-taylor,
author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
title={{Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning}},
booktitle={{Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
pages="156--163",
month="May",
year="2007",
abstract={ The ambitious goal of transfer learning is to
accelerate learning on a target task after training on
a different, but related, source task. While many past
transfer methods have focused on transferring
value-functions, this paper presents a method for
transferring policies across tasks with different
state and action spaces. In particular, this paper
utilizes transfer via inter-task mappings for policy
search methods ({\sc tvitm-ps}) to construct a
transfer functional that translates a population of
neural network policies trained via policy search from
a source task to a target task. Empirical results in
robot soccer Keepaway and Server Job Scheduling show
that {\sc tvitm-ps} can markedly reduce learning time
when full inter-task mappings are available. The
results also demonstrate that {\sc tvitm-ps} still
succeeds when given only incomplete inter-task
mappings. Furthermore, we present a novel method for
learning such mappings when they are not
available, and give results showing they perform
comparably to hand-coded mappings. },
note = {22% acceptance rate},
wwwnote={<a href="http://www.aamas2007.nl/">AAMAS-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {DARPA, NSF}
}

• Matthew E. Taylor, Cynthia Matuszek, Pace Reagan Smith, and Michael Witbrock. Guiding Inference with Policy Search Reinforcement Learning. In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS), May 2007. 52% acceptance rate

Symbolic reasoning is a well understood and effective approach to handling reasoning over formally represented knowledge; however, simple symbolic inference systems necessarily slow as complexity and ground facts grow. As automated approaches to ontology-building become more prevalent and sophisticated, knowledge base systems become larger and more complex, necessitating techniques for faster inference. This work uses reinforcement learning, a statistical machine learning technique, to learn control laws which guide inference. We implement our learning method in ResearchCyc, a very large knowledge base with millions of assertions. A large set of test queries, some of which require tens of thousands of inference steps to answer, can be answered faster after training over an independent set of training queries. Furthermore, this learned inference module outperforms ResearchCyc’s integrated inference module, a module that has been hand-tuned with considerable effort.

@inproceedings{FLAIRS07-taylor-inference,
author="Matthew E. Taylor and Cynthia Matuszek and Pace Reagan Smith and Michael Witbrock",
title={{Guiding Inference with Policy Search Reinforcement Learning}},
booktitle={{Proceedings of the Twentieth International FLAIRS Conference {(FLAIRS})}},
month="May",
year="2007",
abstract="Symbolic reasoning is a well understood and
effective approach to handling reasoning over
formally represented knowledge; however, simple
symbolic inference systems necessarily slow as
complexity and ground facts grow. As automated
approaches to ontology-building become more
prevalent and sophisticated, knowledge base systems
become larger and more complex, necessitating
techniques for faster inference. This work uses
reinforcement learning, a statistical machine
learning technique, to learn control laws which
guide inference. We implement our learning method in
ResearchCyc, a very large knowledge base with
millions of assertions. A large set of test queries,
some of which require tens of thousands of inference
steps to answer, can be answered faster after
training over an independent set of training
queries. Furthermore, this learned inference module
outperforms ResearchCyc's integrated inference
module, a module that has been hand-tuned with
considerable effort.",
note = {52% acceptance rate},
wwwnote={<a href="http://www.cise.ufl.edu/~ddd/FLAIRS/flairs2007/">FLAIRS-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Inference, Machine Learning in Practice},
bib2html_funding = {DARPA},
}

• Shimon Whiteson, Matthew E. Taylor, and Peter Stone. Empirical Studies in Action Selection for Reinforcement Learning. Adaptive Behavior, 15(1), 2007.

To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. This article aims to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to each method’s performance. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.

@article{AB07-whiteson,
author={Shimon Whiteson and Matthew E. Taylor and Peter Stone},
title={{Empirical Studies in Action Selection for Reinforcement Learning}},
year={2007},
volume={15},number={1},
abstract={To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. This article aims to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together.
First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to each method's performance. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.},
bib2html_pubtype={Journal Article},
bib2html_funding={NSF, DARPA},
bib2html_rescat={Reinforcement Learning}
}

• Shimon Whiteson, Matthew E. Taylor, and Peter Strone. Adaptive Tile Coding for Value Function Approximation. Technical Report AI-TR-07-339, University of Texas at Austin, 2007.

Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents \emph{adaptive tile coding}, a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator’s level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.

@techreport{whitesontr07,
author={Whiteson, Shimon and Taylor, Matthew E. and Strone, Peter},
title={{Adaptive Tile Coding for Value Function Approximation}},
institution={University of Texas at Austin},
number={AI-TR-07-339},
year={2007},
abstract={Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents \emph{adaptive tile coding}, a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator's level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.},
bib2html_rescat={Reinforcement Learning},
bib2html_funding={NSF, DARPA},
bib2html_pubtype={Technical Report},
}

### 2006

• Peter Stone, Gregory Kuhlmann, Matthew E. Taylor, and Yaxin Liu. Keepaway Soccer: From Machine Learning Testbed to Benchmark. In Itsuki Noda, Adam Jacoff, Ansgar Bredenfeld, and Yasutake Takahashi, editors, RoboCup-2005: Robot Soccer World Cup IX, volume 4020, pages 93-105. Springer-Verlag, Berlin, 2006. 28% acceptance rate at {R}obo{C}up-2005

Keepaway soccer has been previously put forth as a \emph{testbed} for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources designed to make the domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server. In addition, we report on new experiments in the Keepaway domain, along with performance results designed to be directly comparable with future experimental results. Combined, the new infrastructure and our concrete demonstration of its use in comparative experiments elevate the domain to a machine learning \emph{benchmark}, suitable for use by researchers across the field.

@incollection(ROBOCUP05-stone,
author={Peter Stone and Gregory Kuhlmann and Matthew E. Taylor and Yaxin Liu},
title={{Keepaway Soccer: From Machine Learning Testbed to Benchmark}},
booktitle={{{R}obo{C}up-2005: Robot Soccer World Cup {IX}}},
editor={Itsuki Noda and Adam Jacoff and Ansgar Bredenfeld and Yasutake Takahashi},
publisher={Springer-Verlag},
year={2006},
volume={4020},
pages={93--105},
abstract={Keepaway soccer has been previously put forth as a \emph{testbed} for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources designed to make the domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server. In addition, we report on new experiments in the Keepaway domain, along with performance results designed to be directly comparable with future experimental results. Combined, the new infrastructure and our concrete demonstration of its use in comparative experiments elevate the domain to a machine learning \emph{benchmark}, suitable for use by researchers across the field.},
note={28% acceptance rate at {R}obo{C}up-2005},
wwwnote={Some <a href="http://www.cs.utexas.edu/users/AustinVilla/sim/keepaway/">simulations of keepaway</a> referenced in the paper and keepaway software.<br>Official version from <a href="http://dx.doi.org/10.1007/11780519_9">Publisher's Webpage</a>&copy Springer-Verlag},
bib2html_pubtype={Refereed Book Chapter},
bib2html_rescat={Simulated Robot Soccer, Reinforcement Learning},
bib2html_funding={NSF, ONR, DARPA}
)

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer Learning for Policy Search Methods. In ICML workshop on Structural Knowledge Transfer for Machine Learning, June 2006.
[BibTeX] [Abstract]

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference approach to transfer in reinforcement learning tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies trained via genetic algorithms (GAs) from a source task to a target task. Empirical results in robot soccer Keepaway, a standard RL benchmark domain, demonstrate that transfer via inter-task mapping can markedly reduce the time required to learn a second, more complex, task.

@inproceedings(ICML06-taylor,
author={Matthew E. Taylor and Shimon Whiteson and Peter Stone},
title={{Transfer Learning for Policy Search Methods}},
booktitle={{{ICML} workshop on Structural Knowledge Transfer for Machine Learning}},
month={June},
year={2006},
abstract={An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference approach to transfer in reinforcement learning tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies trained via genetic algorithms (GAs) from a source task to a target task. Empirical results in robot soccer Keepaway, a standard RL benchmark domain, demonstrate that transfer via inter-task mapping can markedly reduce the time required to learn a second, more complex, task.},
wwwnote={<a href="http://www.cs.utexas.edu/~banerjee/icmlws06/">ICML-2006 workshop on Structural Knowledge Transfer for Machine Learning</a>.<br> Superseded by the conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-AAMAS07-taylor.html">Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning</a>.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_funding={NSF, DARPA}
)

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 1321-28, July 2006. 46% acceptance rate, Best Paper Award in GA track (of 85 submissions)

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods’ relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT~\cite{stanley:ec02evolving}, a GA that evolves neural networks, with Sarsa~\cite{Rummery94,Singh96}, a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

@inproceedings{GECCO06-taylor,
author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
title={{Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning}},
booktitle={{Proceedings of the Genetic and Evolutionary Computation Conference ({GECCO})}},
month="July",
year="2006",
pages="1321--28",
abstract={
Both genetic algorithms (GAs) and temporal
difference (TD) methods have proven effective at
solving reinforcement learning (RL) problems.
However, since few rigorous empirical comparisons
have been conducted, there are no general guidelines
describing the methods' relative strengths and
weaknesses. This paper presents the results of a
detailed empirical comparison between a GA and a TD
method in Keepaway, a standard RL benchmark domain
based on robot soccer. In particular, we compare
the performance of NEAT~\cite{stanley:ec02evolving},
a GA that evolves neural networks, with
Sarsa~\cite{Rummery94,Singh96}, a popular TD method.
The results demonstrate that NEAT can learn better
policies in this task, though it requires more
evaluations to do so. Additional experiments in two
variations of Keepaway demonstrate that Sarsa learns
better policies when the task is fully observable
and NEAT learns faster when the task is
deterministic. Together, these results help isolate
the factors critical to the performance of each
method and yield insights into their general
strengths and weaknesses.
},
note = {46% acceptance rate, Best Paper Award in GA track (of 85 submissions)},
wwwnote={<span align="left" style="color: red; font-weight: bold">Best Paper Award</span> (Genetic Algorithms Track) at <a href="http://www.sigevo.org/gecco-2006/">GECCO-2006</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Genetic Algorithms, Machine Learning in Practice},
bib2html_funding = {NSF, DARPA}
}

• Shimon Whiteson, Matthew E. Taylor, and Peter Stone. Adaptive Tile Coding for Reinforcement Learning. In NIPS workshop on: Towards a New Reinforcement Learning?, December 2006.
[BibTeX]
@inproceedings(NIPS06-Whiteson,
author={Shimon Whiteson and Matthew E. Taylor and Peter Stone},
title={{Adaptive Tile Coding for Reinforcement Learning}},
booktitle={{{NIPS} workshop on: Towards a New Reinforcement Learning?}},
month={December},
year={2006},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_funding={NSF, DARPA},
wwwnote={<a href="http://nips.cc/Conferences/2006">NIPS-2006</a> (Poster).<bR> Superseded by the technical report <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-whitesontr07.html">Adaptive Tile Coding for Value Function Approximation</a>.},
)

### 2005

• Matthew E. Taylor and Peter Stone. Behavior Transfer for Value-Function-Based Reinforcement Learning. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 53-59, July 2005. 25% acceptance rate.

Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found very slow in practice. A key feature of TD methods is that they represent policies in terms of value functions. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up TD learning by transferring the learned value function from one task to a second related task. We present experimental results showing that autonomous learners are able to learn one multiagent task and then use behavior transfer to markedly reduce the total training time for a more complex task.

@inproceedings{AAMAS05-taylor,
author="Matthew E. Taylor and Peter Stone",
title={{Behavior Transfer for Value-Function-Based Reinforcement Learning}},
booktitle={{Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="July",
year="2005",
pages="53--59",
abstract={
Temporal difference (TD) learning
methods have become popular
reinforcement learning techniques in recent years. TD
methods have had some experimental successes and have
been shown to exhibit some desirable properties in
theory, but have often been found very slow in
practice. A key feature of TD methods is that they
represent policies in terms of value functions. In
this paper we introduce \emph{behavior transfer}, a
novel approach to speeding up TD learning by
transferring the learned value function from one task
to a second related task. We present experimental
results showing that autonomous learners are able to
learn one multiagent task and then use behavior
transfer to markedly reduce the total training time
for a more complex task.
},
note = {25% acceptance rate.},
wwwnote={<a href="http://www.aamas2005.nl/">AAMAS-2005</a>.<br> Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {DARPA, NSF},
}

• Matthew E. Taylor, Peter Stone, and Yaxin Liu. Value Functions for RL-Based Behavior Transfer: A Comparative Study. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI), July 2005. 18% acceptance rate.

Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generalizing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.

@inproceedings(AAAI05-taylor,
author="Matthew E. Taylor and Peter Stone and Yaxin Liu",
title={{Value Functions for {RL}-Based Behavior Transfer: A Comparative Study}},
booktitle={{Proceedings of the Twentieth National Conference on Artificial Intelligence ({AAAI})}},
month="July",
year="2005",
abstract={
Temporal difference (TD) learning methods have
become popular reinforcement learning techniques in
recent years. TD methods, relying on function
approximators to generalize learning to novel
situations, have had some experimental successes and
have been shown to exhibit some desirable properties
in theory, but have often been found slow in
practice. This paper presents methods for further
generalizing across tasks, thereby speeding
up learning, via a novel form of behavior
transfer. We compare learning on a complex task
with three function approximators, a CMAC, a neural
network, and an RBF, and demonstrate that behavior
transfer works well with all three. Using behavior
transfer, agents are able to learn one task and then
markedly reduce the time it takes to learn a more
complex task. Our algorithms are fully implemented
and tested in the RoboCup-soccer keepaway domain.
},
note = {18% acceptance rate.},
wwwnote={<a href="http://www.aaai.org/Conferences/National/2005/aaai05.html">AAAI-2005</a>. <br> Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {NSF, DARPA}
)

### 2004

• Matthew E. Taylor and Peter Stone. Speeding up Reinforcement Learning with Behavior Transfer. In AAAI 2004 Fall Symposium on Real-life Reinforcement Learning, October 2004.

Reinforcement learning (RL) methods have become popular machine learning techniques in recent years. RL has had some experimental successes and has been shown to exhibit some desirable properties in theory, but it has often been found very slow in practice. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up traditional RL. We present experimental results showing a learner is able learn one task and then use behavior transfer to markedly reduce the total training time for a more complex task.

@inproceedings{AAAI04-Symposium,
author={Matthew E. Taylor and Peter Stone},
title={{Speeding up Reinforcement Learning with Behavior Transfer}},
booktitle={{{AAAI} 2004 Fall Symposium on Real-life Reinforcement Learning}},
month={October},
year={2004},
abstract={Reinforcement learning (RL) methods have become popular machine learning techniques in recent years. RL has had some experimental successes and has been shown to exhibit some desirable properties in theory, but it has often been found very slow in practice. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up traditional RL. We present experimental results showing a learner is able learn one task and then use behavior transfer to markedly reduce the total training time for a more complex task.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_funding={NSF},
wwwnote={Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.}
}