Intelligent Robot Learning Laboratory (IRL Lab) Workshop Papers

### 2018

• Gabriel V. de la Cruz Jr., Yunshu Du, and Matthew E. Taylor. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. In Poster of the Adaptive Learning Agents (ALA) workshop (at FAIM), Stockholm, Sweden, July 2018.

Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using a deep neural network as its function approximator and by learning directly from raw images. A drawback of using raw images is that deep RL must learn the state feature representation from the raw images in addition to learning a policy. As a result, deep RL often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable in real-world settings, particularly when data is expensive. In this work, we speed up training by addressing half of what deep RL is trying to solve — feature learning. We show that using a small set of non-expert human demonstrations during a supervised pre-training stage allows significant improvements in training times. We empirically evaluate our approach using the deep Q-network and the asynchronous advantage actor-critic algorithms in the Atari 2600 games of Pong, Freeway, and Beamrider. Our results show that pre-training a deep RL network provides a significant improvement in training time, even when pre-training from a small number of noisy demonstrations.

@inproceedings{2018ALA-DelaCruz,
author={de la Cruz, Jr., Gabriel V. and Du, Yunshu and Taylor, Matthew E.},
title={{Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning}},
booktitle={{Poster of the Adaptive Learning Agents ({ALA}) workshop (at {FAIM})}},
year={2018},
month={July},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using a deep neural network as its function approximator and by learning directly from raw images. A drawback of using raw images is that deep RL must learn the state feature representation from the raw images in addition to learning a policy.
As a result, deep RL often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable in real-world settings, particularly when data is expensive. In this work, we speed up training by addressing half of what deep RL is trying to solve --- feature learning. We show that using a small set of non-expert human demonstrations during a supervised pre-training stage allows significant improvements in training times.
We empirically evaluate our approach using the deep Q-network and the asynchronous advantage actor-critic algorithms in the Atari 2600 games of Pong, Freeway, and Beamrider. Our results show that pre-training a deep RL network provides a significant improvement in training time, even when pre-training from a small number of noisy demonstrations.}
}

### 2017

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. Curriculum Design for Machine Learners in Sequential Decision Tasks. In Proceedings of the Adaptive Learning Agents workshop (at AAMAS), Sao Paulo, Brazil, May 2017.

Existing machine-learning work has shown that algorithms can benefit from curricula—learning first on simple examples before moving to more difficult examples. This work defines the curriculum-design problem in the context of sequential decision tasks, analyzes how different curricula affect agent learning in a Sokoban-like domain, and presents results of a user study that explores whether non-experts generate such curricula. Our results show that 1) different curricula can have substantial impact on training speeds while longer curricula do not always result in worse agent performance in learning all tasks within the curricula (including the target task), 2) more benefits of curricula can be found as the target task’s complexity increases, 3) the method for providing reward feedback to the agent as it learns within a curriculum does not change which curricula are best, 4) non-expert users can successfully design curricula that result in better overall agent performance than learning from scratch, even in the absence of feedback, and 5) non-expert users can discover and follow salient principles when selecting tasks in a curriculum. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula.

@inproceedings{2017ALA-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{Curriculum Design for Machine Learners in Sequential Decision Tasks}},
booktitle={{Proceedings of the Adaptive Learning Agents workshop (at {AAMAS})}},
month={May},
year={2017},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Existing machine-learning work has shown that algorithms can benefit from curricula---learning first on simple examples before moving to more difficult examples. This work defines the curriculum-design problem in the context of sequential decision tasks, analyzes how different curricula affect agent learning in a Sokoban-like domain, and presents results of a user study that explores whether non-experts generate such curricula. Our results show that 1) different curricula can have substantial impact on training speeds while longer curricula do not always result in worse agent performance in learning all tasks within the curricula (including the target task), 2) more benefits of curricula can be found as the target task's complexity increases, 3) the method for providing reward feedback to the agent as it learns within a curriculum does not change which curricula are best, 4) non-expert users can successfully design curricula that result in better overall agent performance than learning from scratch, even in the absence of feedback, and 5) non-expert users can discover and follow salient principles when selecting tasks in a curriculum. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula. }
}

• Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. Speeding up Tabular Reinforcement Learning Using State-Action Similarities. In Proceedings of the Adaptive Learning Agents (ALA) workshop (at AAMAS), Brazil, May 2017. Best Paper Award

This paper proposes a novel method to speed up temporal difference learning by using state-action similarities. These hand-coded similarities are tested in three well-studied domains, demonstrating our approach’s benefits. Additionally, a human subjects study with 16 programmers shows that the proposed approach can reduce the engineering effort of human designers. These results combine to show that our novel method is not only effective, but can be efficiently used by non-expert designers.

@inproceedings{2017ALA-Rosenfeld,
author={Ariel Rosenfeld and Matthew E. Taylor and Sarit Kraus},
title={{Speeding up Tabular Reinforcement Learning Using State-Action Similarities}},
booktitle={{Proceedings of the Adaptive Learning Agents ({ALA}) workshop (at {AAMAS})}},
year={2017},
month={May},
note={Best Paper Award},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper proposes a novel method to speed up temporal difference learning by using state-action similarities. These hand-coded similarities are tested in three well-studied domains, demonstrating our approach's benefits. Additionally, a human subjects study with 16 programmers shows that the proposed approach can reduce the engineering effort of human designers. These results combine to show that our novel method is not only effective, but can be efficiently used by non-expert designers.}
}

• Leah A. Zulas, Kaitlyn I. Franz, Darrin Griechen, and Matthew E. Taylor. Solar Decathlon Competition: Towards a Solar-Powered Smart Home. In Proceedings of the AI for Smart Grids and Buildings Workshop (at AAAI), February 2017.

Alternative energy is becoming a growing source of power in the United States, including wind, hydroelectric and solar. The Solar Decathlon is a competition run by the US Department of Energy every two years. Washington State University (WSU) is one of twenty teams recently selected to compete in the fall 2017 challenge. A central part to WSU’s entry is incorporating new and existing smart home technology from the grZound up. The smart home can help to optimize energy loads, battery life and general comfort of the user in the home. This paper discusses the high-level goals of the project, hardware selected, build strategy and anticipated approach.

@inproceedings{2017AAAI-Solar-Zulas,
author={Zulas, A. Leah and Franz, Kaitlyn I. and Griechen, Darrin and Taylor, Matthew E.},
title={{Solar Decathlon Competition: Towards a Solar-Powered Smart Home}},
booktitle={{Proceedings of the AI for Smart Grids and Buildings Workshop (at {AAAI})}},
month={February},
year={2017},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Alternative energy is becoming a growing source of power in the United States, including wind, hydroelectric and solar. The Solar Decathlon is a competition run by the US Department of Energy every two years. Washington State University (WSU) is one of twenty teams recently selected to compete in the fall 2017 challenge. A central part to WSU’s entry is incorporating new and existing smart home technology from the grZound up. The smart home can help to optimize energy loads, battery life and general comfort of the user in the home. This paper discusses the high-level goals of the project, hardware selected, build strategy and anticipated approach.}
}

### 2016

• William Curran, Tim Brys, David Aha, Matthew E. Taylor, and William D. Smart. Dimensionality Reduced Reinforcement Learning for Assistive Robots. In AAAI 2016 Fall Symposium on Artificial Intelligence: Workshop on Artificial Intelligence for Human-Robot Interaction, Arlington, VA, USA, November 2016.

State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.

@inproceedings{2016AAAI-AI-HRI-Curran,
author={William Curran and Tim Brys and David Aha and Matthew E. Taylor and William D. Smart},
title={{Dimensionality Reduced Reinforcement Learning for Assistive Robots}},
booktitle={{{AAAI} 2016 Fall Symposium on Artificial Intelligence: Workshop on Artificial Intelligence for Human-Robot Interaction}},
month={November},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.}
}

• Yunshu Du, Gabriel V. de la Cruz Jr., James Irwin, and Matthew E. Taylor. Initial Progress in Transfer for Deep Reinforcement Learning Algorithms. In Proceedings of Deep Reinforcement Learning: Frontiers and Challenges workshop (at IJCAI), New York City, NY, USA, July 2016.

As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning (TL) techniques to speed up learning in DQN. We applied this technique in two domains, Atari games and cart-pole, and show that TL can improve DQN’s performance on both tasks without altering the network structure.

@inproceedings{2016DeepRL-Du,
author={Du, Yunshu and de la Cruz, Jr., Gabriel V. and Irwin, James and Taylor, Matthew E.},
title={{Initial Progress in Transfer for Deep Reinforcement Learning Algorithms}},
booktitle={{Proceedings of Deep Reinforcement Learning: Frontiers and Challenges workshop (at {IJCAI})}},
year={2016},
month={July},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning (TL) techniques to speed up learning in DQN. We applied this technique in two domains, Atari games and cart-pole, and show that TL can improve DQN’s performance on both tasks without altering the network structure.
}
}

• Yunshu Du and Matthew E. Taylor. Work In-progress: Mining the Student Data for Fitness . In Proceedings of the 12th International Workshop on Agents and Data Mining Interaction (ADMI) (at AAMAS), Singapore, May 2016.

Data mining-driven agents are often used in applications such as waiting times estimation or traffic flow prediction. Such approaches often require large amounts of data from multiple sources, which may be difficult to obtain and lead to incomplete or noisy datasets. University ID card data, in contrast, is easy to access with very low noise. However, little attention has been paid to the availability of these datasets and few applications have been developed to improve student services on campus. This work uses data from CougCard, the Washington State University official ID card, used daily by most students. Our goal is to build an intelligent agent to improve student service quality by predicting the crowdedness at different campus facilities. This work in-progress focuses on the University Recreation Center, one of the most popular facilities on campus, to optimize students’ workout experiences.

@inproceedings{2016ADMI-Du,
author={Yunshu Du and Matthew E. Taylor},
title={{Work In-progress: Mining the Student Data for Fitness }},
booktitle={{Proceedings of the 12th International Workshop on Agents and Data Mining Interaction ({ADMI}) (at {AAMAS})}},
year={2016},
month={May},
abstract = {Data mining-driven agents are often used in applications such as waiting times estimation or traffic flow prediction. Such approaches often require large amounts of data from multiple sources, which may be difficult to obtain and lead to incomplete or noisy datasets. University ID card data, in contrast, is easy to access with very low noise. However, little attention has been paid to the availability of these datasets and few applications have been developed to improve student services on campus. This work uses data from CougCard, the Washington State University official ID card, used daily by most students. Our goal is to build an intelligent agent to improve student service quality by predicting the crowdedness at different campus facilities. This work in-progress focuses on the University Recreation Center, one of the most popular facilities on campus, to optimize students’ workout experiences.}
}

• Pablo Hernandez-Leal, Matthew E. Taylor, Benjamin Rosman, Enrique L. Sucar, and Enrique Munoz de Cote. Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach. In Proceedings of the Multiagent Interaction without Prior Coordination workshop (at AAAI), Phoenix, AZ, USA, February 2016.

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards.

@inproceedings{2016AAAI-HernandezLeal,
author={Pablo Hernandez-Leal and Matthew E. Taylor and Benjamin Rosman and L. Enrique Sucar and Enrique {Munoz de Cote}},
title={{Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach}},
booktitle={{Proceedings of the Multiagent Interaction without Prior Coordination workshop (at {AAAI})}},
year={2016},
month={February},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards.}
}

• David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. Work in Progress: Lifelong Learning for Disturbance Rejection on Mobile Robots. In Proceedings of the Adaptive Learning Agents (ALA) workshop (at AAMAS), Singapore, May 2016.

No two robots are exactly the same — even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Further, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled. These preliminary results are an initial step towards learning robust fault-tolerant control for arbitrary robots.

@inproceedings{2016ALA-Isele,
author={Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and de la Cruz, Jr., Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
title={{Work in Progress: Lifelong Learning for Disturbance Rejection on Mobile Robots}},
booktitle={{Proceedings of the Adaptive Learning Agents ({ALA}) workshop (at {AAMAS})}},
year={2016},
month={May},
abstract = {No two robots are exactly the same — even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Further, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled. These preliminary results are an initial step towards learning robust fault-tolerant control for arbitrary robots.}
}

• Timothy Lewis, Amy Hurst, Matthew E. Taylor, and Cynthia Matuszek. Using Language Groundings for Context-Sensitive Text Prediction. In Proceedings of EMNLP 2016 Workshop on Uphill Battles in Language Processing, Austin, TX, USA, November 2016.

In this paper, we present the concept of using language groundings for context-sensitive text prediction using a semantically informed, context-aware language model. We show initial findings from a preliminary study investigating how users react to a communication interface driven by context-based prediction using a simple language model. We suggest that the results support further exploration using a more informed semantic model and more realistic context.

@inproceedings{2016EMNLP-Lewis,
author={Timothy Lewis and Amy Hurst and Matthew E. Taylor and Cynthia Matuszek},
title={{Using Language Groundings for Context-Sensitive Text Prediction}},
booktitle={{Proceedings of {EMNLP} 2016 Workshop on Uphill Battles in Language Processing}},
month={November},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={In this paper, we present the concept of using language groundings for context-sensitive text prediction using a semantically informed, context-aware language model. We show initial findings from a preliminary study investigating how users react to a communication interface driven by context-based prediction using a simple language model. We suggest that the results support further exploration using a more informed semantic model and more realistic context.}
}

• Robert Loftin, Matthew E. Taylor, Michael L. Littman, James MacGlashan, Bei Peng, and David L. Roberts. Open Problems for Online Bayesian Inference in Neural Networks. In Proceedings of Bayesian Deep Learning workshop (at NIPS), December 2016.
@inproceedings{2016NIPS-BayesDL-Loftin,
author={Robert Loftin and Matthew E. Taylor and Michael L. Littman and James MacGlashan and Bei Peng and David L. Roberts},
title={{Open Problems for Online Bayesian Inference in Neural Networks}},
booktitle={{Proceedings of Bayesian Deep Learning workshop (at {NIPS})}},
month={December},
year={2016},
url={http://bayesiandeeplearning.org/papers/BDL_42.pdf},
bib2html_pubtype={Refereed Workshop or Symposium}
}

• Robert Loftin, James MacGlashan, Bei Peng, Matthew E. Taylor, Michael L. Littman, and David L. Roberts. Towards Behavior-Aware Model Learning from Human-Generated Trajectories. In AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction, Arlington, VA, USA, November 2016.

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAMis highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.

@inproceedings{2016AAAI-AI-HRI-Loftin,
author={Robert Loftin and James MacGlashan and Bei Peng and Matthew E. Taylor and Michael L. Littman and David L. Roberts},
title={{Towards Behavior-Aware Model Learning from Human-Generated Trajectories}},
booktitle={{{AAAI} Fall Symposium on Artificial Intelligence for Human-Robot Interaction}},
month={November},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM
makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAMis highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.}
}

• James MacGlashan, Michael L. Littman, David L. Roberts, Robert Loftin, Bei Peng, and Matthew E. Taylor. Convergent Actor Critic by Humans. In Workshop on Human-Robot Collaboration: Towards Co-Adaptive Learning Through Semi-Autonomy and Shared Control (at IROS), October 2016.

Programming robot behavior can be painstaking: for a layperson, this path is unavailable without investing significant effort in building up proficiency in coding. In contrast, nearly half of American households have a pet dog and at least some exposure to animal training, suggesting an alternative path for customizing robot behavior. Unfortunately, most existing reinforcement-learning (RL) algorithms are not well suited to learning from human-delivered reinforcement. This paper introduces a framework for incorporating human-delivered rewards into RL algorithms and preliminary results demonstrating feasibility.

@inproceedings{2016IROS-HRC-MacGlashan,
author={James MacGlashan and Michael L. Littman and David L. Roberts and Robert Loftin and Bei Peng and Matthew E. Taylor},
title={{Convergent Actor Critic by Humans}},
booktitle={{Workshop on Human-Robot Collaboration: Towards Co-Adaptive Learning Through Semi-Autonomy and Shared Control (at {IROS})}},
month={October},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Programming robot behavior can be painstaking: for a layperson, this path is unavailable without investing significant effort in building up proficiency in coding. In contrast, nearly half of American households have a pet dog and at least some exposure to animal training, suggesting an alternative path for customizing robot behavior. Unfortunately, most existing reinforcement-learning (RL) algorithms are not well suited to learning from human-delivered reinforcement. This paper introduces a framework for incorporating human-delivered rewards into RL algorithms and preliminary results demonstrating feasibility.}
}

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. An Empirical Study of Non-Expert Curriculum Design for Machine Learners. In Proceedings of the Interactive Machine Learning workshop (at IJCAI), New York City, NY, USA, July 2016.

Existing machine-learning work has shown that algorithms can benefit from curriculum learning, a strategy where the target behavior of the learner is changed over time. However, most existing work focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, neglecting how non-expert humans may design curricula. In this work we introduce a curriculumdesign problem in the context of reinforcement learning and conduct a user study to explicitly explore how non-expert humans go about assembling curricula. We present results from 80 participants on Amazon Mechanical Turk that show 1) humans can successfully design curricula that gradually introduce more complex concepts to the agent within each curriculum, and even across different curricula, and 2) users choose to add task complexity in different ways and follow salient principles when selecting tasks into the curriculum. This work serves as an important first step towards better integration of non-expert humans into the reinforcement learning process and the development of new machine learning algorithms to accommodate human teaching strategies.

@inproceedings{2016IML-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{An Empirical Study of Non-Expert Curriculum Design for Machine Learners}},
booktitle={{Proceedings of the Interactive Machine Learning workshop (at {IJCAI})}},
month={July},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Existing machine-learning work has shown that algorithms can benefit from curriculum learning, a strategy where the target behavior of the learner is changed over time. However, most existing work focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, neglecting how non-expert humans may design curricula. In this work we introduce a curriculumdesign problem in the context of reinforcement learning and conduct a user study to explicitly explore how non-expert humans go about assembling curricula. We present results from 80 participants on Amazon Mechanical Turk that show 1) humans can successfully design curricula that gradually introduce more complex concepts to the agent within each curriculum, and even across different curricula, and 2) users choose to add task complexity in different ways and follow salient principles when selecting tasks into the curriculum. This work serves as an important first step towards better integration of non-expert humans into the reinforcement learning process and the development of new machine learning algorithms to accommodate human teaching strategies.}
}

• Zhaodong Wang and Matthew E. Taylor. Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study. In AAAI 2016 Spring Symposium, March 2016.

There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent’s demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other’s internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent’s confidence in its representation of the source agent’s policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.

@inproceedings{2016AAAI-SSS-Wang,
author={Zhaodong Wang and Matthew E. Taylor},
title={{Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study}},
booktitle={{{AAAI} 2016 Spring Symposium}},
month={March},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent's demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other's internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent's confidence in its representation of the source agent's policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.}
}

• Ruofei Xu, Robin Hartshorn, Ryan Huard, James Irwin, Kaitlyn Johnson, Gregory Nelson, Jon Campbell, Sakire Arslan Ay, and Matthew E. Taylor. Towards a Semi-Autonomous Wheelchair for Users with ALS. In Proceedings of Workshop on Autonomous Mobile Service Robots (at IJCAI), New York City, NY, USA, July 2016.

This paper discusses a prototype system built over two years by teams of undergraduate students with the goal of assisting users with Amyotrophic Lateral Sclerosis (ALS). The current prototype powered wheelchair uses both onboard and offboard sensors to navigate within and between rooms, avoiding obstacles. The wheelchair can be directly controlled via multiple input devices, including gaze tracking — in this case, the wheelchair can augment the user’s control to avoid obstacles. In its fully autonomous mode, the user can select a position on a pre-built map and the wheelchair will navigate to the desired location. This paper introduces the design and implementation of our system, as well as performs three sets of experiments to characterize its performance. The long-term goal of this work is to significantly improve the lives of users with mobility impairments, with a particular focus on those that have limited motor abilities.

@inproceedings{2016IJCAI-Xu,
author={Ruofei Xu and Robin Hartshorn and Ryan Huard and James Irwin and Kaitlyn Johnson and Gregory Nelson and Jon Campbell and Sakire Arslan Ay and Matthew E. Taylor},
title={{Towards a Semi-Autonomous Wheelchair for Users with {ALS}}},
booktitle={{Proceedings of Workshop on Autonomous Mobile Service Robots (at {IJCAI})}},
year={2016},
month={July},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper discusses a prototype system built over two years by teams of undergraduate students with the goal of assisting users with Amyotrophic Lateral Sclerosis (ALS). The current prototype powered wheelchair uses both onboard and offboard sensors to navigate within and between rooms, avoiding obstacles. The wheelchair can be directly controlled via multiple input devices, including gaze tracking --- in this case, the wheelchair can augment the user's control to avoid obstacles. In its fully autonomous mode, the user can select a position on a pre-built map and the wheelchair will navigate to the desired location. This paper introduces the design and implementation of our system, as well as performs three sets of experiments to characterize its performance. The long-term goal of this work is to significantly improve the lives of users with mobility impairments, with a particular focus on those that have limited motor abilities.}
}

### 2015

• Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents. In Proceedings of the Learning for General Competency in Video Games workshop (AAAI), January 2015.

Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly.

@inproceedings(2015AAAI-Delacruz,
title={{Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents}},
author={de la Cruz, Jr., Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
booktitle={{Proceedings of the Learning for General Competency in Video Games workshop ({AAAI})}},
month={January},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Crowdsourcing},
bib2html_funding={NSF},
abstract={Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly.},
)

• William Curran, Tim Brys, Matthew E. Taylor, and William D. Smart. Using PCA to Efficiently Represent State Spaces. In ICML-2015 European Workshop on Reinforcement Learning, Lille, France, July 2015.

Reinforcement learning algorithms need to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces. This is known as the curse of dimensionality. By projecting the agent’s state onto a low-dimensional manifold, we can represent the state space in a smaller and more efficient representation. By using this representation during learning, the agent can converge to a good policy much faster. We test this approach in the Mario Benchmarking Domain. When using dimensionality reduction in Mario, learning converges much faster to a good policy. But, there is a critical convergence-performance trade-off. By projecting onto a low-dimensional manifold, we are ignoring important data. In this paper, we explore this trade-off of convergence and performance. We find that learning in as few as 4 dimensions (instead of 9), we can improve performance past learning in the full dimensional space at a faster convergence rate.

@inproceedings{2015ICML-Curran,
author={William Curran and Tim Brys and Matthew E. Taylor and William D. Smart},
title={{Using PCA to Efficiently Represent State Spaces}},
booktitle={{{ICML}-2015 European Workshop on Reinforcement Learning}},
month={July},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Reinforcement learning algorithms need to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces. This is known as the curse of dimensionality. By projecting the agent’s state onto a low-dimensional manifold, we can represent the state space in a smaller and more efficient representation. By using this representation during learning, the agent can converge to a good policy much faster. We test this approach in the Mario Benchmarking Domain. When using dimensionality reduction in Mario, learning converges much faster to a good policy. But, there is a critical convergence-performance trade-off. By projecting onto a low-dimensional manifold, we are ignoring important data. In this paper, we explore this trade-off of convergence and performance. We find that learning in as few as 4 dimensions (instead of 9), we can improve performance past learning in the full dimensional space at a faster convergence rate.}
}

• Pablo Hernandez-Leal, Matthew E. Taylor, Enrique Munoz de Cote, and Enrique L. Sucar. Learning Against Non-Stationary Opponents in Double Auctions. In Proceedings of the Adaptive Learning Agents (ALA) workshop 2015, Istanbul, Turkey, May 2015. Finalist for Best Student Paper

Energy markets are emerging around the world. In this context, the PowerTAC competition has gained attention for being a realistic and powerful simulation platform that can be used to perform robust research on retail energy markets. Agent in this complex environment typically use different strategies throughout their interaction, changing from one to another depending on diverse factors, for example, to adapt to population needs and to keep the competitors guessing. This poses a problem for learning algorithms as most of them are not capable of handling changing strategies. The previous champion of the PowerTAC competition is no exception, and is not capable of adapting quickly to non-stationary opponents, potentially impacting its performance. This paper introduces DriftER, an algorithm that learns a model of the opponent and keeps track of its error-rate. When the error-rate increases for several timesteps, the opponent has most likely changed strategy and the agent should learn a new model. Results in the PowerTAC simulator show that DriftER is capable of detecting switches in the opponent faster than an existing state of the art algorithms against switching (non-stationary) opponents obtaining better results in terms of profit and accuracy.

@inproceedings{2015ALA-HernandezLeal,
author={Pablo Hernandez-Leal and Matthew E. Taylor and Munoz de Cote, Enrique and Sucar, L. Enrique},
title={{Learning Against Non-Stationary Opponents in Double Auctions}},
booktitle={{Proceedings of the Adaptive Learning Agents ({ALA}) workshop 2015}},
year={2015},
month={May},
note = {Finalist for Best Student Paper},
abstract = {Energy markets are emerging around the world. In this context, the PowerTAC competition has gained attention for being a realistic and powerful simulation platform that can be used to perform robust research on retail energy markets. Agent in this complex environment typically use different strategies throughout their interaction, changing from one to another depending on diverse factors, for example, to adapt to population needs and to keep the competitors guessing. This poses a problem for learning algorithms as most of them are not capable of handling changing strategies. The previous champion of the PowerTAC competition is no exception, and is not capable of adapting quickly to non-stationary opponents, potentially impacting its performance. This paper introduces DriftER, an algorithm that learns a model of the opponent and keeps track of its error-rate. When the error-rate increases for several timesteps, the opponent has most likely changed strategy and the agent should learn a new model. Results in the PowerTAC simulator show that DriftER is capable of detecting switches in the opponent faster than an existing state of the art algorithms against switching (non-stationary) opponents obtaining better results in terms of profit and accuracy.}
}

• Bei Peng, Robert Loftin, James MacGlashan, Michael L. Littman, Matthew E. Taylor, and David L. Roberts. Language and Policy Learning from Human-delivered Feedback. In Proceedings of the Machine Learning for Social Robotics workshop (at ICRA), May 2015.

Using rewards and punishments is a common and familiar paradigm for humans to train intelligent agents. Most existing learning algorithms in this paradigm follow a framework in which human feedback is treated as a numerical signal to be maximized by the agent. However, treating feedback as a numeric signal fails to capitalize on implied information the human trainer conveys with a lack of explicit feedback. For example, a trainer may withhold reward to signal to the agent a failure, or they may withhold punishment to signal that the agent is behaving correctly. We review our progress to date with Strategy-aware Bayesian Learning, which is able to learn from experience the ways trainers use feedback, and can exploit that knowledge to accelerate learning. Our work covers contextual bandits, goal-directed sequential decision-making tasks, and natural language command learning. We present a user study design to identify how users’ feedback strategies are affected by properties of the environment and agent competency for natural language command learning in sequential decision making tasks, which will inform the development of more adaptive models of human feedback in the future.

@inproceedings{2015ICRA-Peng,
author={Bei Peng and Robert Loftin and James MacGlashan and Michael L. Littman and Matthew E. Taylor and David L. Roberts},
title={{Language and Policy Learning from Human-delivered Feedback}},
booktitle={{Proceedings of the Machine Learning for Social Robotics workshop (at {ICRA})}},
month={May},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={Using rewards and punishments is a common and familiar paradigm for humans to train intelligent agents. Most existing learning algorithms in this paradigm follow a framework in which human feedback is treated as a numerical signal to be maximized by the agent. However, treating feedback as a numeric signal fails to capitalize on implied information the human trainer conveys with a lack of explicit feedback. For example, a trainer may withhold reward to signal to the agent a failure, or they may withhold punishment to signal that the agent is behaving correctly. We review our progress to date with Strategy-aware Bayesian Learning, which is able to learn from experience the ways
trainers use feedback, and can exploit that knowledge to accelerate learning. Our work covers contextual bandits, goal-directed sequential decision-making tasks, and natural language command learning. We present a user study design to identify how users’ feedback strategies are affected by properties of the environment and agent competency for natural language command learning in sequential decision making tasks, which will inform the development of more adaptive models of human feedback in the future.}
}

• Mitchell Scott, Bei Peng, Madeline Chili, Tanay Nigam, Francis Pascual, Cynthia Matuszek, and Matthew E. Taylor. On the Ability to Provide Demonstrations on a UAS: Observing 90 Untrained Participants Abusing a Flying Robot. In Proceedings of the AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction (AI-HRI), November 2015.

This paper presents an exploratory study where participants piloted a commercial UAS (unmanned aerial system) through an obstacle course. The goal was to determine how varying the instructions given to participants affected their performance. Preliminary data suggests future studies to perform, as well as guidelines for human-robot interaction, and some best practices for learning from demonstration studies.

@inproceedings{2015AI_HRI-Scott,
author={Mitchell Scott and Bei Peng and Madeline Chili and Tanay Nigam and Francis Pascual and Cynthia Matuszek and Matthew E. Taylor},
title={{On the Ability to Provide Demonstrations on a UAS: Observing 90 Untrained Participants Abusing a Flying Robot}},
booktitle={{Proceedings of the {AAAI} Fall Symposium on Artificial Intelligence and Human-Robot Interaction ({AI-HRI})}},
month={November},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper presents an exploratory study where participants piloted a commercial UAS (unmanned aerial system) through an obstacle course. The goal was to determine how varying the instructions given to participants affected their performance. Preliminary data suggests future studies to perform, as well as guidelines for human-robot interaction, and some best practices for learning from demonstration studies.}
}

• Yusen Zhan and Matthew E. Taylor. Online Transfer Learning in Reinforcement Learning Domains. In Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), November 2015.

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

@inproceedings{2015SDMIA-Zhan,
author={Yusen Zhan and Matthew E. Taylor},
title={{Online Transfer Learning in Reinforcement Learning Domains}},
booktitle={{Proceedings of the {AAAI} Fall Symposium on Sequential Decision Making for Intelligent Agents ({SDMIA})}},
month={November},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.}
}

• Yawei Zhang, Yunxiang Ye, Zhaodong Wang, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. Intelligent In-Orchard Bin-Managing System for Tree Fruit Production. In Proceedings of the Robotics in Agriculture workshop (at ICRA), May 2015.

The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.

@inproceedings{2015ICRA-Zhang,
author={Yawei Zhang and Yunxiang Ye and Zhaodong Wang and Matthew E. Taylor and Geoffrey A. Hollinger and Qin Zhang},
title={{Intelligent In-Orchard Bin-Managing System for Tree Fruit Production}},
booktitle={{Proceedings of the Robotics in Agriculture workshop (at {ICRA})}},
month={May},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.}
}

### 2014

• Haitham Bou Ammar, Eric Eaton, Matthew E. Taylor, Decibal C. Mocanu, Kurt Driessens, Gerhard Weiss, and Karl Tuyls. An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning. In Proceedings of the Machine Learning for Interactive Systems workshop (at AAAI), July 2014.
@inproceedings(2014MLIS-BouAmmar,
title={{An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning}},
author={Haitham Bou Ammar and Eric Eaton and Matthew E. Taylor and Decibal C. Mocanu and Kurt Driessens and Gerhard Weiss and Karl Tuyls},
booktitle={{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI})}},
month={July},
year={2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning}
)

• Chris HolmesParker, Matthew E. Taylor, Yusen Zhan, and Kagan Tumer. Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
@inproceedings(2014ALA-HolmesParker,
author={Chris HolmesParker and Matthew E. Taylor and Yusen Zhan and Kagan Tumer},
title={{Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
month={May},
year= {2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
)

• James Macglashan, Michael L. Littman, Robert Loftin, Bei Peng, David Roberts, and Matthew E. Taylor. Training an Agent to Ground Commands with Reward and Punishment. In Proceedings of the Machine Learning for Interactive Systems workshop (at AAAI), July 2014.
@inproceedings(2014MLIS-James,
title={{Training an Agent to Ground Commands with Reward and Punishment}},
author={James Macglashan and Michael L. Littman and Robert Loftin and Bei Peng and David Roberts and Matthew E. Taylor},
booktitle={{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI})}},
month={July},
year={2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning}
)

• Yusen Zhan, Anestis Fachantidis, Ioannis Vlahavas, and Matthew E. Taylor. Agents Teaching Humans in Reinforcement Learning Tasks. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
@inproceedings(2014ALA-Zhan,
author={Yusen Zhan and Anestis Fachantidis and Ioannis Vlahavas and Matthew E. Taylor},
title={{Agents Teaching Humans in Reinforcement Learning Tasks}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
month={May},
year= {2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
)

### 2013

• Ravi Balasubramanian and Matthew E. Taylor. Learning for Mobile-Robot Error Recovery (Extended Abstract). In The AAAI 2013 Spring Symposium — Designing Intelligent Robots: Reintegrating AI II, March 2013.
@inproceedings(AAAI13Symp-Balasubramanian,
author={Ravi Balasubramanian and Matthew E. Taylor},
title={{Learning for Mobile-Robot Error Recovery (Extended Abstract)}},
booktitle={{The {AAAI} 2013 Spring Symposium --- Designing Intelligent Robots: Reintegrating {AI} {II}}},
month={March},
year= {2013},
wwwnote={<a href="http://people.csail.mit.edu/gdk/dir2/">Designing Intelligent Robots</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Robotics},
)

• Nicholas Carboni and Matthew E. Taylor. Preliminary Results for 1 vs.~1 Tactics in Starcraft. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), May 2013.

This paper describes the development and analysis of two algorithms designed to allow one agent, the teacher, to give advice to another agent, the student. These algorithms contribute to a family of algorithms designed to allow teaching with limited advice. We compare the ability of the student to learn using reinforcement learning with and without such advice. Experiments are conducted in the Starcraft domain, a challenging but appropriate domain for this type of research. Our results show that the time at which advice is given has a significant effect on the result of student learning and that agents with the best performance in a task may not always be the most effective teachers.

@inproceedings(ALA13-Carboni,
author={Nicholas Carboni and Matthew E. Taylor},
title={{Preliminary Results for 1 vs.~1 Tactics in Starcraft}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={May},
year= {2013},
wwwnote={<a href="http://swarmlab.unimaas.nl/ala2013/">ALA-13</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
abstract={This paper describes the development and analysis of two algorithms designed to allow one agent, the teacher, to give advice to another agent, the student. These algorithms contribute to a family of algorithms designed to allow teaching with limited advice. We compare the ability of the student to learn using reinforcement learning with and without such advice. Experiments are conducted
in the Starcraft domain, a challenging but appropriate domain for this type of research. Our results show that the time at which advice is given has a significant effect on the result of student learning and that agents with the best performance in a task may not always be the most effective teachers.},
)

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract). In The AAAI 2013 Spring Symposium — Lifelong Machine Learning, March 2013.
@inproceedings(AAAI13-Anestis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract)}},
booktitle={{The {AAAI} 2013 Spring Symposium --- Lifelong Machine Learning}},
month={March},
year= {2013},
wwwnote={<a href="http://cs.brynmawr.edu/~eeaton/AAAI-SSS13-LML/">Lifelong Machine Learning</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Robotics},
)

• Tong Pham, Tim Brys, and Matthew E. Taylor. Learning Coordinated Traffic Light Control. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), May 2013.

Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Some distributed constraint optimization approaches have also been used, but focus on cases where the traffic flows are known. This paper presents a preliminary comparison between these two classes of optimization methods in a complex simulator, with the goal of eventually producing real-time algorithms that could be deployed in real-world situations.

@inproceedings(ALA13-Pham,
author={Tong Pham and Tim Brys and Matthew E. Taylor},
title={{Learning Coordinated Traffic Light Control}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={May},
year= {2013},
wwwnote={<a href="http://swarmlab.unimaas.nl/ala2013/">ALA-13</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning,DCOP},
abstract={Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Some distributed constraint optimization approaches have also been used, but focus on cases where the traffic flows are known. This paper presents a preliminary comparison between these two classes of optimization methods in a complex simulator, with the goal of eventually producing real-time algorithms that could be deployed in real-world situations.},
)

### 2012

• Matthew Adams, Robert Loftin, Matthew E. Taylor, Michael Littman, and David Roberts. An Empirical Analysis of RL’s Drift From Its Behaviorism Roots. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), June 2012.

We present an empirical survey of reinforcement learning techniques and relate these techniques to concepts from behaviorism, a field of psychology concerned with the learning process. Specifically, we examine two standard RL algorithms, model-free SARSA, and model-based R-MAX, when used with various shaping techniques. We consider multiple techniques for incorporating shaping into these algorithms, including the use of options and potentialbased shaping. Findings indicate any improvement in sample complexity that results from shaping is limited at best. We suggest that this is either due to reinforcement learning not modeling behaviorism well, or behaviorism not modeling animal learning well. We further suggest that a paradigm shift in reinforcement learning techniques is required before the kind of learning performance that techniques from behaviorism indicate are possible can be realized.

@inproceedings(ALA12-Adams,
author={Matthew Adams and Robert Loftin and Matthew E. Taylor and Michael Littman and David Roberts},
title={{An Empirical Analysis of {RL}'s Drift From Its Behaviorism Roots}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={June},
year={2012},
wwwnote={<a href="http://como.vub.ac.be/ALA2012/">ALA-12</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
abstract={We present an empirical survey of reinforcement learning techniques and relate these techniques to concepts from behaviorism, a field of psychology concerned with the learning process. Specifically, we examine two standard RL algorithms, model-free SARSA, and model-based R-MAX, when used with various shaping techniques. We consider multiple techniques for incorporating shaping into these algorithms, including the use of options and potentialbased shaping. Findings indicate any improvement in sample complexity that results from shaping is limited at best. We suggest that this is either due to reinforcement learning not modeling behaviorism well, or behaviorism not modeling animal learning well. We further suggest that a paradigm shift in reinforcement learning techniques is required before the kind of learning performance that techniques from behaviorism indicate are possible can be realized.},
)

• Sanjeev Sharma and Matthew E. Taylor. Autonomous Waypoint Generation Strategy for On-Line Navigation in Unknown Environments. In IROS Workshop on Robot Motion Planning: Online, Reactive, and in Real-Time, October 2012.
@INPROCEEDINGS{IROSWS12-Sharma,
author={Sanjeev Sharma and Matthew E. Taylor},
title={{Autonomous Waypoint Generation Strategy for On-Line Navigation in Unknown Environments}},
booktitle={{{IROS} Workshop on Robot Motion Planning: Online, Reactive, and in Real-Time}},
year={2012},
month={October},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Robotics},
}

• Lisa Torrey and Matthew E. Taylor. Help an Agent Out: Student/Teacher Learning in Sequential Decision Tasks. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), June 2012.

Research on agents has led to the development of algorithms for learning from experience, accepting guidance from humans, and imitating experts. This paper explores a new direction for agents: the ability to teach other agents. In particular, we focus on situations where the teacher has limited expertise and instructs the student through action advice. The paper proposes and evaluates several teaching algorithms based on providing advice at a gradually decreasing rate. A crucial component of these algorithms is the ability of an agent to estimate its confidence in a state. We also contribute a student/teacher framework for implementing teaching strategies, which we hope will spur additional development in this relatively unexplored area.

@inproceedings(ALA12-Torrey,
author={Lisa Torrey and Matthew E. Taylor},
title={{Help an Agent Out: Student/Teacher Learning in Sequential Decision Tasks}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={June},
year={2012},
wwwnote={<a href="http://como.vub.ac.be/ALA2012/">ALA-12</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
abstract={Research on agents has led to the development of algorithms for learning from experience, accepting guidance from humans, and imitating experts. This paper explores a new direction for agents: the ability to teach other agents. In particular, we focus on situations where the teacher has limited expertise and instructs the student through action advice. The paper proposes and evaluates several teaching algorithms based on providing advice at a gradually decreasing rate. A crucial component of these algorithms is the ability of an agent to estimate its confidence in a state. We also contribute a student/teacher framework for implementing teaching strategies, which we hope will spur additional development in this relatively unexplored area.},
)

### 2011

• Scott Alfeld, Kumera Berkele, Stephen A. Desalvo, Tong Pham, Daniel Russo, Lisa Yan, and Matthew E. Taylor. Reducing the team uncertainty penalty: empirical and theoretical approaches. In Proceedings of the workshop on multiagent sequential decision making in uncertain domains (aamas), May 2011.
@inproceedings(MSDM11-Alfeld,
author="Scott Alfeld and Kumera Berkele and Stephen A. Desalvo and Tong Pham and Daniel Russo and Lisa Yan and Matthew E. Taylor",
title="Reducing the Team Uncertainty Penalty: Empirical and Theoretical Approaches",
booktitle="Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains (AAMAS)",
month="May",
year= "2011",
wwwnote={<a href="http://teamcore.usc.edu/junyounk/msdm2011/">MSDM-11</a>},
bib2html_pubtype = {Refereed Workshop or Symposium},
bib2html_rescat = {DCOP},
)

• Haitham Bou Ammar and Matthew E. Taylor. Common subspace transfer for reinforcement learning tasks. In Proceedings of the adaptive and learning agents workshop (aamas), May 2011.
@inproceedings(ALA11-Ammar,
author="Haitham Bou Ammar and Matthew E. Taylor",
title="Common Subspace Transfer for Reinforcement Learning Tasks",
booktitle="Proceedings of the Adaptive and Learning Agents workshop (AAMAS)",
month="May",
year= "2011",
wwwnote={<a href="http://como.vub.ac.be/ALA2011/">ALA-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
)

• Haitham Bou Ammar, Matthew E. Taylor, Karl Tuyls, and Gerhard Weiss. Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping. In Proceedings of the European Workshop on Multi-agent Systems, November 2011.
@inproceedings(EUMASS11-Amar,
author={Haitham Bou Ammar and Matthew E. Taylor and Karl Tuyls and Gerhard Weiss},
title={{Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping}},
booktitle={{Proceedings of the European Workshop on Multi-agent Systems}},
month={November},
year={2011},
wwwnote={<a href="http://swarmlab.unimaas.nl/eumas2011/">EUMAS-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
)

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. Transfer Learning via Multiple Inter-Task Mappings. In Proceedings of European Workshop on Reinforcement Learning (ECML), September 2011.
@inproceedings{EWRL11-Fachantidis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{Transfer Learning via Multiple Inter-Task Mappings}},
booktitle={{Proceedings of European Workshop on Reinforcement Learning ({ECML})}},
month = {September},
year={2011},
wwwnote={<a href="http://http://ewrl.wordpress.com/ewrl9-2011/">EWRL-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
}

• Bradley W. Knox, Matthew E. Taylor, and Peter Stone. Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report. In Proceedings of the Agents Learning Interactively from Human Teachers workshop (IJCAI), July 2011.
@inproceedings{ALIHT11-Knox,
author={W. Bradley Knox and Matthew E. Taylor and Peter Stone},
title={{Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report}},
booktitle={{Proceedings of the Agents Learning Interactively from Human Teachers workshop ({IJCAI})}},
month={July},
year={2011},
bib2html_pubtype={Refereed Workshop or Symposium},
}

• Jun-young Kwak, Zhengyu Yin, Rong Yang, Matthew E. Taylor, and Milind Tambe. Robust Execution-time Coordination in DEC-POMDPs Under Model Uncertainty. In Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains (AAMAS), May 2011.
@inproceedings(MSDM11-Kwak,
author={Jun-young Kwak and Zhengyu Yin and Rong Yang and Matthew E. Taylor and Milind Tambe},
title={{Robust Execution-time Coordination in {DEC-POMDPs} Under Model Uncertainty}},
booktitle={{Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains ({AAMAS})}},
month={May},
year={2011},
wwwnote={<a href="http://teamcore.usc.edu/junyounk/msdm2011/">MSDM-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Distributed POMDPs},
)

• Paul Scerri, Balajee Kannan, Pras Velagapudi, Kate Macarthur, Peter Stone, Matthew E. Taylor, John Dolan, Alessandro Farinelli, Archie Chapman, Bernadine Dias, and George Kantor. Flood Disaster Mitigation: A Real-world Challenge Problem for Multi-Agent Unmanned Surface Vehicles. In Proceedings of the Autonomous Robots and Multirobot Systems workshop (AAMAS), May 2011.

As we advance the state of technology for robotic systems, there is a need for defining complex real-world challenge problems for the multi-agent/robot community to address. A well-defined challenge problem can motivate researchers to aggressively address and overcome core domain challenges that might otherwise take years to solve. As the focus of multi-agent research shifts from the mature domains of UGV and UAVs to USVs, there is a need for outlining well-defined and realistic challenge problems. In this position paper, we define one such problem, food disaster mitigation. The ability to respond quickly and effectively to disasters is essential to saving lives and limiting the scope of damage. The nature of floods dictates the need for a fleet of low-cost and small autonomous boats that can provide situational awareness (SA), damage assessment and deliver supplies before more traditional emergency response assets can access an affected area. In addition to addressing an essential need, the outlined application provides an interesting challenge problem for advancing fundamental research in multi-agent systems (MAS) specific to the USV domain. In this paper, we define a technical statement of this MAS challenge problem based and outline MAS specific technical constraints based on the associated real-world constraints. Core MAS sub-problems that must be solved for this application include coordination, control, human interaction, autonomy, task allocation, and communication. This problem provides a concrete and real-world MAS application that will bring together researchers with a diverse range of expertise to develop and implement the necessary algorithms and mechanisms.

@inproceedings(ARMS11-Scerri,
author={Paul Scerri and Balajee Kannan and Pras Velagapudi and Kate Macarthur and Peter Stone and Matthew E. Taylor and John Dolan and Alessandro Farinelli and Archie Chapman and Bernadine Dias and George Kantor},
title={{Flood Disaster Mitigation: A Real-world Challenge Problem for Multi-Agent Unmanned Surface Vehicles}},
booktitle={{Proceedings of the Autonomous Robots and Multirobot Systems workshop ({AAMAS})}},
month={May},
year={2011},
abstract={As we advance the state of technology for robotic systems, there is a need for defining complex real-world challenge problems for the multi-agent/robot community to address. A well-defined challenge problem can motivate researchers to aggressively address and overcome core domain challenges that might otherwise take years to solve. As the focus of multi-agent research shifts from the mature domains of UGV and UAVs to USVs, there is a need for outlining well-defined and realistic challenge problems. In this position paper, we define one such problem, food disaster mitigation. The ability to respond quickly and effectively to disasters is essential to saving lives and limiting the scope of damage. The nature of floods dictates the need for a fleet of low-cost and small autonomous boats that can provide situational awareness (SA), damage assessment and deliver supplies before more traditional emergency response assets can access an affected area. In addition to addressing an essential need, the outlined application provides an interesting challenge problem for advancing fundamental research in multi-agent systems (MAS) specific to the USV domain. In this paper, we define a technical statement of this MAS challenge problem based and outline MAS specific technical constraints based on the associated real-world constraints. Core MAS sub-problems that must be solved for this application include coordination, control, human interaction, autonomy, task allocation, and communication. This problem provides a concrete and real-world MAS application that will bring together researchers with a diverse range of expertise to develop and implement the necessary algorithms and mechanisms.},
wwwnote={<a href="http://www.alg.ewi.tudelft.nl/arms2011/">ARMS-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
)

• Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. Using Human Demonstrations to Improve Reinforcement Learning. In The AAAI 2011 Spring Symposium — Help Me Help You: Bridging the Gaps in Human-Agent Collaboration, March 2011.

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration.

@inproceedings(AAAI11Symp-Taylor,
author={Matthew E. Taylor and Halit Bener Suay and Sonia Chernova},
title={{Using Human Demonstrations to Improve Reinforcement Learning}},
booktitle={{The {AAAI} 2011 Spring Symposium --- Help Me Help You: Bridging the Gaps in Human-Agent Collaboration}},
month={March},
year={2011},
wwwnote={<a href="www.isi.edu/~maheswar/hmhy2011.html">HMHY2011</a>},
abstract={This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance
in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning
significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies
the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration. },
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
)

• Matthew E. Taylor. Model Assignment: Reinforcement Learning in a Generalized Mario Domain. In Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence, August 2011.
@inproceedings(EAAI11-ModelAssignment,
author={Matthew E. Taylor},
title={{Model Assignment: Reinforcement Learning in a Generalized Mario Domain}},
booktitle={{Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence}},
month={August},
year={2011},
wwwnote={<a href="eaai.stanford.edu">EAAI-11</a><br><a href="http://www.cs.lafayette.edu/~taylorm/11EAAI/index.html">Assignment Webpage</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Pedagogy},
)

• Matthew E. Taylor. Teaching Reinforcement Learning with Mario: An Argument and Case Study. In Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence, August 2011.
@inproceedings(EAAI11-Taylor,
author={Matthew E. Taylor},
title={{Teaching Reinforcement Learning with Mario: An Argument and Case Study}},
booktitle={{Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence}},
month={August},
year={2011},
wwwnote={<a href="eaai.stanford.edu">EAAI-11</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Pedagogy},
)

• Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning. In Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), April 2011.
@inproceedings(ADPRL11-Whiteson,
author={Shimon Whiteson and Brian Tanner and Matthew E. Taylor and Peter Stone},
title={{Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning}},
booktitle={{Proceedings of the {IEEE} Symposium on Adaptive Dynamic Programming and Reinforcement Learning ({ADPRL})}},
month={April},
year={2011},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
bib2html_funding={Reinforcement Learning},
)

### 2010

• Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe. Towards a Theoretic Understanding of DCEE. In Proceedings of the Distributed Constraint Reasoning workshop (AAMAS), May 2010.

Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phenomenon has been shown empirically in simulation, but the underlying mathematics are not yet understood. By understanding the mathematics, we could develop algorithms that reduce or eliminate this penalty of increased teamwork. <br> In this paper we investigate the team uncertainty penalty on two fronts. First, we provide results of robots exhibiting the same behavior seen in simulations. Second, we present a mathematical foundation by which to analyze the phenomenon. Using this model, we present findings indicating that the team uncertainty penalty is inherent to the level of teamwork allowed, rather than to specific algorithms.

@inproceedings(DCR10-Alfeld,
author={Scott Alfeld and Matthew E. Taylor and Prateek Tandon and Milind Tambe},
title={{Towards a Theoretic Understanding of {DCEE}}},
booktitle={{Proceedings of the Distributed Constraint Reasoning workshop ({AAMAS})}},
month={May},
year={2010},
wwwnote={<a href="https://www.cs.drexel.edu/dcr2010">DCR-10</a>},
abstract={Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phenomenon has been shown empirically in simulation, but the underlying mathematics are not yet understood. By understanding the mathematics, we could develop algorithms that reduce or eliminate this penalty of increased teamwork. <br> In this paper we investigate the team uncertainty penalty on two fronts. First, we provide results of robots exhibiting the same behavior seen in simulations. Second, we present a mathematical foundation by which to analyze the phenomenon. Using this model, we present findings indicating that the team uncertainty penalty is inherent to the level of teamwork allowed, rather than to specific algorithms.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={DCOP, Robotics},
)

• Samuel Barrett, Matthew E. Taylor, and Peter Stone. Transfer Learning for Reinforcement Learning on a Physical Robot. In Proceedings of the Adaptive and Learning Agents workshop (AAMAS), May 2010.

As robots become more widely available, many capabilities that were once only practical to develop and test in simulation are becoming feasible on real, physically grounded, robots. This newfound feasibility is important because simulators rarely represent the world with sufficient fidelity that developed behaviors will work as desired in the real world. However, development and testing on robots remains difficult and time consuming, so it is desirable to minimize the number of trials needed when developing robot behaviors. <br> This paper focuses on reinforcement learning (RL) on physically grounded robots. A few noteworthy exceptions notwithstanding, RL has typically been done purely in simulation, or, at best, initially in simulation with the eventual learned behaviors run on a real robot. However, some recent RL methods exhibit sufficiently low sample complexity to enable learning entirely on robots. One such method is transfer learning for RL. The main contribution of this paper is the first empirical demonstration that transfer learning can significantly speed up and even improve asymptotic performance of RL done entirely on a physical robot. In addition, we show that transferring information learned in simulation can bolster additional learning on the robot.

@inproceedings(ALA10-Barrett,
author={Samuel Barrett and Matthew E. Taylor and Peter Stone},
title={{Transfer Learning for Reinforcement Learning on a Physical Robot}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop ({AAMAS})}},
month={May},
year={2010},
wwwnote={<a href="http://www-users.cs.york.ac.uk/~grzes/ala10/">ALA-10</a>},
abstract={ As robots become more widely available, many capabilities that were once only practical to develop and test in simulation are becoming feasible on real, physically grounded, robots. This newfound feasibility is important because simulators rarely represent the world with sufficient fidelity that developed behaviors will work as desired in the real world. However, development and testing on robots remains difficult and time consuming, so it is desirable to minimize the number of trials needed when developing robot behaviors. <br> This paper focuses on reinforcement learning (RL) on physically grounded robots. A few noteworthy exceptions notwithstanding, RL has typically been done purely in simulation, or, at best, initially in simulation with the eventual learned behaviors run on a real robot. However, some recent RL methods exhibit sufficiently low sample complexity to enable learning entirely on robots. One such method is transfer learning for RL. The main contribution of this paper is the first empirical demonstration that transfer learning can significantly speed up and even improve asymptotic performance of RL done entirely on a physical robot. In addition, we show that transferring information learned in simulation can bolster additional learning on the robot.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Robotics},
)

• Matthew E. Taylor and Sonia Chernova. Integrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer. In Proceedings of the Agents Learning Interactively from Human Teachers workshop (AAMAS), May 2010.

This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations can be transferred into a baseline policy for an agent, and reinforcement learning can be used to significantly improve policy performance. These results are an important initial step that suggest that agents can not only quickly learn to mimic human actions, but that they can also learn to surpass the abilities of the teacher.

@inproceedings(ALIHT10-Taylor,
author={Matthew E. Taylor and Sonia Chernova},
title={{Integrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer}},
booktitle={{Proceedings of the Agents Learning Interactively from Human Teachers workshop ({AAMAS})}},
month={May},
year={2010},
abstract={ This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations can be transferred into a baseline policy for an agent, and reinforcement learning can be used to significantly improve policy performance. These results are an important initial step that suggest that agents can not only quickly learn to mimic human actions, but that they can also learn to surpass the abilities of the teacher.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
)

### 2009

• Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. In Proceedings of the Third International Workshop on Agent Technology for Sensor Networks (AAMAS), May 2009.
[BibTeX] [Abstract]

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

@inproceedings(ATSN09-Jain,
author={Manish Jain and Matthew E. Taylor and Makoto Yokoo and Milind Tambe},
title={{{DCOP}s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
booktitle={{Proceedings of the Third International Workshop on Agent Technology for Sensor Networks ({AAMAS})}},
month={May},
year= {2009},
wwwnote={<a href="http://www.atsn09.org">ATSN-2009</a><br>Superseded by the IJCAI-09 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-IJCAI09-Jain.html">DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks</a>.},
abstract={Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={DCOP, Robotics},
bib2html_funding={DARPA}
)

• Jun-young Kwak, Pradeep Varakantham, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping. In Proceedings of the Fourth Workshop on Multi-agent Sequential Decision-Making in Uncertain Domains (AAMAS), May 2009.

While distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, a novel algorithm to solve such distributed POMDPs. Two major novelties in TREMOR are (i) use of social model shaping to coordinate agents, (ii) harnessing efficient single agent-POMDP solvers. Experimental results demonstrate that TREMOR may provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

@inproceedings(MSDM09-Kwak,
author={Jun-young Kwak and Pradeep Varakantham and Matthew E. Taylor and Janusz Marecki and Paul Scerri and Milind Tambe},
title={{Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping}},
booktitle={{Proceedings of the Fourth Workshop on Multi-agent Sequential Decision-Making in Uncertain Domains ({AAMAS})}},
month={May},
year= {2009},
wwwnote={<a href="http://www.eecs.harvard.edu/~seuken/msdm2009/">MSDM-2009</a><br> Superseded by the ICAPS-09 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-ICAPS09-Varakantham.html">Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping</a>.},
abstract={While distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, a novel algorithm to solve such distributed POMDPs. Two major novelties in TREMOR are (i) use of social model shaping to coordinate agents, (ii) harnessing efficient single agent-POMDP solvers. Experimental results demonstrate that TREMOR may provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Distributed POMDPs},
bib2html_funding={ARMY}
)

• Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems. In Proceedings of the AAMAS-09 Workshop on Agent Design: Advancing from Practice to Theory, May 2009.

As multi-agent research transitions into the real world, evaluation becomes an increasingly important challenge. One can run controlled and repeatable tests in a laboratory environment, but such tests may be difficult, or even impossible, once the system is deployed. Furthermore, traditional metrics used by computer scientists, such as runtime analysis, may be largely irrelevant.

@inproceedings(ADAPT09-Taylor,
author={Matthew E. Taylor and Chris Kiekintveld and Craig Western and Milind Tambe},
title={{Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems}},
booktitle={{Proceedings of the {AAMAS}-09 Workshop on Agent Design: Advancing from Practice to Theory}},
month={May},
year={2009},
abstract={ As multi-agent research transitions into the real world, evaluation becomes an increasingly important challenge. One can run controlled and repeatable tests in a laboratory environment, but such tests may be difficult, or even impossible, once the system is deployed. Furthermore, traditional metrics used by computer scientists, such as runtime analysis, may be largely irrelevant.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Security},
bib2html_funding={CREATE}
)

• Matthew E. Taylor and Peter Stone. Categorizing Transfer for Reinforcement Learning. In Poster at the Multidisciplinary Symposium on Reinforcement Learning, June 2009.
@inproceedings{MSRL09-Taylor,
author={Matthew E. Taylor and Peter Stone},
title={{Categorizing Transfer for Reinforcement Learning}},
booktitle={{Poster at the Multidisciplinary Symposium on Reinforcement Learning}},
month={June},
year={2009},
wwwnote={<a href="http://msrl09.rl-community.org/">MSRL-09</a>.},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Refereed Workshop or Symposium},
}

• Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. Using DCOPs to Balance Exploration and Exploitation in Time-Critical Domains. In Proceedings of the IJCAI 2009 Workshop on Distributed Constraint Reasoning, July 2009.

Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically-motivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the \emph{Distributed Constraint Optimization Problem} framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.

@inproceedings(DCR09-Taylor,
author={Matthew E. Taylor and Manish Jain and Prateek Tandon and Milind Tambe},
title={{Using {DCOP}s to Balance Exploration and Exploitation in Time-Critical Domains}},
booktitle={{Proceedings of the {IJCAI} 2009 Workshop on Distributed Constraint Reasoning}},
month={July},
year={2009},
wwwnote={<a href="http://www-scf.usc.edu/~wyeoh/DCR09/">DCR-2009</a>},
abstract={Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent's decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically-motivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the \emph{Distributed Constraint Optimization Problem} framework and demonstrates when, and at what cost, increasing agents' coordination can improve the global reward on such problems.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={DCOP},
bib2html_funding={ARMY}
)

• Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. Is There a Chink in Your ARMOR? Towards Robust Evaluations for Deployed Security Systems. In Proceedings of the IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications, July 2009.
@inproceedings(QRASA09-Taylor,
author={Matthew E. Taylor and Chris Kiekintveld and Craig Western and Milind Tambe},
title={{Is There a Chink in Your ARMOR? {T}owards Robust Evaluations for Deployed Security Systems}},
booktitle={{Proceedings of the {IJCAI} 2009 Workshop on Quantitative Risk Analysis for Security Applications}},
month={July},
year={2009},
wwwnote={<a href="http://teamcore.usc.edu/QRASA-09">QRASA-2009</a><br>Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-Informatica10-Taylor.html">A Framework for Evaluating Deployed Security Systems: Is There a Chink in your ARMOR?</a>.},
abstract={},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Security},
bib2html_funding={CREATE}
)

• Matthew E. Taylor. Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design. In The AAAI 2009 Spring Symposium on Agents that Learn from Human Teachers, March 2009.

Transfer learning is a successful technique that significantly improves machine learning algorithms by training on a sequence of tasks rather than a single task in isolation. However, there is currently no systematic method for deciding how to construct such a sequence of tasks. In this paper, I propose that while humans are well-suited for the task of curriculum development, significant research is still necessary to better understand how to create effective curricula for machine learning algorithms.

@inproceedings(AAAI09SS-Taylor,
author={Matthew E. Taylor},
title={{Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design}},
booktitle={{The {AAAI} 2009 Spring Symposium on Agents that Learn from Human Teachers}},
month={March},
year={2009},
abstract={Transfer learning is a successful technique that significantly improves machine learning algorithms by training on a sequence of tasks rather than a single task in isolation. However, there is
currently no systematic method for deciding how to construct such a sequence of tasks. In this paper, I propose that while humans are well-suited for the task of curriculum development, significant research is still necessary to better understand how to create effective curricula for machine learning algorithms.},
wwwnote={<a href="http://www.cc.gatech.edu/AAAI-SS09-LFH/Home.html">AAAI 2009 Spring Symposium on Agents that Learn from Human Teachers},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_funding={}
)

• Jason Tsai, Emma Bowring, Shira Epstein, Natalie Fridman, Prakhar Garg, Gal Kaminka, Andrew Ogden, Milind Tambe, and Matthew E. Taylor. Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport. In Proceedings of the Workshop on Emergency Management: Incident, Resource, and Supply Chain Management, November 2009.
@inproceedings(EMWS09-Tsai,
author={Jason Tsai and Emma Bowring and Shira Epstein and Natalie Fridman and Prakhar Garg and Gal Kaminka and Andrew Ogden and Milind Tambe and Matthew E. Taylor},
title={{Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport}},
booktitle={{Proceedings of the Workshop on Emergency Management: Incident, Resource, and Supply Chain Management}},
month={November},
year={2009},
wwwnote={<a href="http://www.ics.uci.edu/~projects/cert/EMWS09">EMWS09-2009</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Security},
)

• Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. Generalized Domains for Empirical Evaluations in Reinforcement Learning. In Proceedings of the Fourth Workshop on Evaluation Methods for Machine Learning at ICML-09, June 2009.

Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to method overfitting, wherein results may not generalize to similar environments. To address this problem, we advocate empirical evaluations using generalized domains: parameterized problem generators that explicitly encode variations in the environment to which the learner should be robust. We argue that evaluating across a set of these generated problems offers a more meaningful evaluation of reinforcement learning algorithms.

@inproceedings(ICMLWS09-Whiteson,
author={Shimon Whiteson and Brian Tanner and Matthew E. Taylor and Peter Stone},
title={{Generalized Domains for Empirical Evaluations in Reinforcement Learning}},
booktitle={{Proceedings of the Fourth Workshop on Evaluation Methods for Machine Learning at {ICML}-09}},
month={June},
year={2009},
wwwnote={<a href="http://www.site.uottawa.ca/ICML09WS/">Fourth annual workshop on Evaluation Methods for Machine Learning</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
bib2html_funding={},
abstract={Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to method overfitting, wherein results may not generalize to similar environments. To address this problem, we advocate empirical evaluations using generalized domains: parameterized problem generators that explicitly encode variations in the environment to which the learner should be robust. We argue that evaluating across a set of these generated problems offers a more meaningful evaluation of reinforcement learning algorithms.},
)

### 2008

• Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. Transferring Instances for Model-Based Reinforcement Learning. In The Adaptive Learning Agents and Multi-Agent Systems (ALAMAS+ALAG) workshop at AAMAS, May 2008.
[BibTeX] [Abstract]

\emph{Reinforcement learning} agents typically require a significant amount of data before performing well on complex tasks. \emph{Transfer learning} methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample complexity and asymptotic performance of a model-based algorithm when learning in a continuous state space.

@inproceedings(AAMAS08-ALAMAS-Taylor,
author={Matthew E. Taylor and Nicholas K. Jong and Peter Stone},
title={{Transferring Instances for Model-Based Reinforcement Learning}},
booktitle={{The Adaptive Learning Agents and Multi-Agent Systems ({ALAMAS+ALAG}) workshop at {AAMAS}}},
month={May},
year={2008},
abstract = {\emph{Reinforcement learning} agents typically require a significant amount of data before performing well on complex tasks. \emph{Transfer learning} methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample complexity and asymptotic performance of a model-based algorithm when learning in a continuous state space.},
wwwnote={<a href="http://ki.informatik.uni-wuerzburg.de/~kluegl/ALAMAS.ALAg/">AAMAS 2008 workshop on Adaptive Learning Agents and Multi-Agent Systems</a><br> Superseded by the ECML-08 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-ECML08-Taylor.html">Transferring Instances for Model-Based Reinforcement Learning</a>.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning, Planning},
bib2html_funding={NSF, DARPA}
)

### 2007

• Matthew E. Taylor and Peter Stone. Representation Transfer for Reinforcement Learning. In AAAI 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development, November 2007.

Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer learning methods are able to successfully transfer knowledge from a source reinforcement learning task into a target task, reducing learning time. However, the complimentary task of transferring knowledge between agents with different internal representations has not been well explored The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work defines representation transfer, contrasts it with task transfer, and introduces two novel algorithms. Additionally, we show representation transfer algorithms can also be successfully used for task transfer, providing an empirical connection between the two problems. These algorithms are fully implemented in a complex multiagent domain and experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.

@inproceedings(AAAI07-Symposium,
author={Matthew E. Taylor and Peter Stone},
title={{Representation Transfer for Reinforcement Learning}},
booktitle={{{AAAI} 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development}},
month={November},
year={2007},
abstract={Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer learning methods are able to successfully transfer knowledge from a source reinforcement learning task into a target task, reducing learning time. However, the complimentary task of transferring knowledge between agents with different internal representations has not been well explored The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work defines representation transfer, contrasts it with task transfer, and introduces two novel algorithms. Additionally, we show representation transfer algorithms can also be successfully used for task transfer, providing an empirical connection between the two problems. These algorithms are fully implemented in a complex multiagent domain and experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial. },
wwwnote={<a href="http://yertle.isi.edu/~clayton/aaai-fss07/index.php/Welcome">2007 AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_funding={DARPA, NSF},
)

• Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Accelerating Search with Transferred Heuristics. In ICAPS-07 workshop on AI Planning and Learning, September 2007.

@inproceedings(ICAPS07WS-taylor,
author={Matthew E. Taylor and Gregory Kuhlmann and Peter Stone},
title={{Accelerating Search with Transferred Heuristics}},
booktitle={{{ICAPS}-07 workshop on AI Planning and Learning}},
month={September},
year={2007},
wwwnote={<a href="http://www.cs.umd.edu/users/ukuter/icaps07aipl/">ICAPS 2007 workshop on AI Planning and Learning</a>},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Planning},
bib2html_funding={NSF, DARPA}
)

• Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Doug Burger, and Kathryn S. McKinley. Policy Search Optimization for Spatial Path Planning. In NIPS-07 workshop on Machine Learning for Systems Problems, December 2007. (Two page extended abstract.)
[BibTeX]
@inproceedings(NIPS07-taylor,
author={Matthew E. Taylor and Katherine E. Coons and Behnam Robatmili and Doug Burger and Kathryn S. McKinley},
title={{Policy Search Optimization for Spatial Path Planning}},
booktitle={{{NIPS}-07 workshop on Machine Learning for Systems Problems}},
month={December},
year={2007},
note={(Two page extended abstract.)},
wwwnote={<a href="http://radlab.cs.berkeley.edu/MLSys/">NIPS 2007 workshop on Machine Learning for Systems Problems</a><br> Superseded by the PACT-08 conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-PACT08-Coons.html">Using Reinforcement Learning to Select Policy Features for Distributed Instruction Placement</a>.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Autonomic Computing, Machine Learning in Practice},
)

### 2006

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer Learning for Policy Search Methods. In ICML workshop on Structural Knowledge Transfer for Machine Learning, June 2006.
[BibTeX] [Abstract]

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference approach to transfer in reinforcement learning tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies trained via genetic algorithms (GAs) from a source task to a target task. Empirical results in robot soccer Keepaway, a standard RL benchmark domain, demonstrate that transfer via inter-task mapping can markedly reduce the time required to learn a second, more complex, task.

@inproceedings(ICML06-taylor,
author={Matthew E. Taylor and Shimon Whiteson and Peter Stone},
title={{Transfer Learning for Policy Search Methods}},
booktitle={{{ICML} workshop on Structural Knowledge Transfer for Machine Learning}},
month={June},
year={2006},
abstract={An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference approach to transfer in reinforcement learning tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies trained via genetic algorithms (GAs) from a source task to a target task. Empirical results in robot soccer Keepaway, a standard RL benchmark domain, demonstrate that transfer via inter-task mapping can markedly reduce the time required to learn a second, more complex, task.},
wwwnote={<a href="http://www.cs.utexas.edu/~banerjee/icmlws06/">ICML-2006 workshop on Structural Knowledge Transfer for Machine Learning</a>.<br> Superseded by the conference paper <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-AAMAS07-taylor.html">Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning</a>.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_funding={NSF, DARPA}
)

• Shimon Whiteson, Matthew E. Taylor, and Peter Stone. Adaptive Tile Coding for Reinforcement Learning. In NIPS workshop on: Towards a New Reinforcement Learning?, December 2006.
[BibTeX]
@inproceedings(NIPS06-Whiteson,
author={Shimon Whiteson and Matthew E. Taylor and Peter Stone},
title={{Adaptive Tile Coding for Reinforcement Learning}},
booktitle={{{NIPS} workshop on: Towards a New Reinforcement Learning?}},
month={December},
year={2006},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
bib2html_funding={NSF, DARPA},
wwwnote={<a href="http://nips.cc/Conferences/2006">NIPS-2006</a> (Poster).<bR> Superseded by the technical report <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-whitesontr07.html">Adaptive Tile Coding for Value Function Approximation</a>.},
)

### 2004

• Matthew E. Taylor and Peter Stone. Speeding up Reinforcement Learning with Behavior Transfer. In AAAI 2004 Fall Symposium on Real-life Reinforcement Learning, October 2004.

Reinforcement learning (RL) methods have become popular machine learning techniques in recent years. RL has had some experimental successes and has been shown to exhibit some desirable properties in theory, but it has often been found very slow in practice. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up traditional RL. We present experimental results showing a learner is able learn one task and then use behavior transfer to markedly reduce the total training time for a more complex task.

@inproceedings{AAAI04-Symposium,
author={Matthew E. Taylor and Peter Stone},
title={{Speeding up Reinforcement Learning with Behavior Transfer}},
booktitle={{{AAAI} 2004 Fall Symposium on Real-life Reinforcement Learning}},
month={October},
year={2004},
abstract={Reinforcement learning (RL) methods have become popular machine learning techniques in recent years. RL has had some experimental successes and has been shown to exhibit some desirable properties in theory, but it has often been found very slow in practice. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up traditional RL. We present experimental results showing a learner is able learn one task and then use behavior transfer to markedly reduce the total training time for a more complex task.},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_funding={NSF},
wwwnote={Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.}
}