Intelligent Robot Learning Laboratory (IRL Lab) Conference Papers

2017

• Santosh Bhusal, Shivam Goel, Kapil Khanal, Matthew E. Taylor, and Manoj Karkee. Bird detection, tracking and counting in wine grapes. In 2017 ASABE annual international meeting, page 1. American Society of Agricultural and Biological Engineers,, 2017. This is not a peer-reviewed article

Bird damage in fruit crops is a critical problem in wine grapes, blueberries and other fruit crops especially during the weeks close to the harvesting period. Usually small birds such as Starlings, Robins and Finches feed extensively on wine grapes. Automated detection, localization, and tracking of these birds in the field will be necessary to identify best locations for installing bird scaring devices in the field as well as to use autonomous UAS operation to deter them. A section of wine grape plot (~30 m x 30 m) was constantly monitored using four GoPro cameras installed at the four corners of the plot. Videos were recorded at 1080p resolution with 30 frames per second. In this paper, Gaussian mixture-based Background/Foreground Segmentation Algorithm was used in detecting birds flying in and out of the wine grape plot. This algorithm can detect moving objects in a video irrespective of their shape, size and color. Detected birds were tracked over a period of time using Kalman filter. Then, a field boundary was defined to estimate the count of the birds flying in and out of the plot through the boundary. Two performance measures, precision and recall (sensitivity), were used to analyze the accuracy of the counting method. Precision refers to the usefulness of the system and recall measures its completeness. Results showed that the proposed method can achieve a precision of 85% in counting birds entering or leaving a crop field with a sensitivity of 87%. Such a system could have a wide range of applications when birds‘ presence is a problem such as in crop fields, airport and cattle farms.

@inproceedings{2017ASABE-Bhusal,
title={Bird Detection, Tracking and Counting in Wine Grapes},
author={Bhusal, Santosh and Goel, Shivam and Khanal, Kapil and Taylor, Matthew E. and Karkee, Manoj},
booktitle={2017 {ASABE} Annual International Meeting},
pages={1},
year={2017},
doi={10.13031/aim.201700300},
note={This is not a peer-reviewed article},
organization={American Society of Agricultural and Biological Engineers},
abstract={Bird damage in fruit crops is a critical problem in wine grapes, blueberries and other fruit crops especially during the weeks close to the harvesting period. Usually small birds such as Starlings, Robins and Finches feed extensively on wine grapes. Automated detection, localization, and tracking of these birds in the field will be necessary to identify best locations for installing bird scaring devices in the field as well as to use autonomous UAS operation to deter them. A section of wine grape plot (~30 m x 30 m) was constantly monitored using four GoPro cameras installed at the four corners of the plot. Videos were recorded at 1080p resolution with 30 frames per second. In this paper, Gaussian mixture-based Background/Foreground Segmentation Algorithm was used in detecting birds flying in and out of the wine grape plot. This algorithm can detect moving objects in a video irrespective of their shape, size and color. Detected birds were tracked over a period of time using Kalman filter. Then, a field boundary was defined to estimate the count of the birds flying in and out of the plot through the boundary. Two performance measures, precision and recall (sensitivity), were used to analyze the accuracy of the counting method. Precision refers to the usefulness of the system and recall measures its completeness. Results showed that the proposed method can achieve a precision of 85% in counting birds entering or leaving a crop field with a sensitivity of 87%. Such a system could have a wide range of applications when birds‘ presence is a problem such as in crop fields, airport and cattle farms.}
}

• Salam El Bsat, Haitham Bou Ammar, and Matthew E. Taylor. Scalable Multitask Policy Gradient Reinforcement Learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017. 25% acceptance rate

Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.

@inproceedings{2017AAAI-ElBsat,
author={El Bsat, Salam and Bou Ammar, Haitham and Taylor, Matthew E.},
booktitle={{Proceedings of the 31st {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={February},
year={2017},
note={25% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
abstract={Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.}
}

• Shivam Goel, Santosh Bhusal, Matthew E. Taylor, and Manoj Karkee. Detection and localization of birds for Bird Deterrence using UAS. In 2017 ASABE Annual International Meeting, page 1. American Society of Agricultural and Biological Engineers,, 2017. This is not a peer-reviewed article

Cherry, grape, and blueberry growers lose around 80 million dollars annually to bird damage in the state of Washington alone. Growers of a wide range of crops have a critical need for a safe and cost-effective method for persistent bird deterrence, which would lead to significantly reduced production costs. The goal of this research is to build a completely autonomous Unmanned Aerial System (UAS) to deter birds from the blueberry fields and grape vineyards. In the effort to build the UAS, the most vital part of its implementation is the vision system. The primary objective of this paper is to build a system to detect and localize birds. To detect birds, background subtraction algorithms have been used and the performance of various background subtraction algorithms are measured. It is found out that ViBe, a background subtraction algorithm, performs best in the bird detection scenario and provides an accuracy of 63%. In the quest of improving the bird detection speed and obtaining it in real time, a split window technique is used to improve the detection speed by 13%. To estimate the distance of the detected bird, a stereo vision system is proposed. With our current system, an accurate measure of the distance of the object is possible from 2 to 7 meters with an error accuracy of 30 centimeters. The long-term goal is to combine the efforts of the paper to successfully create a completely autonomous Smart Scarecrow that can safely, effectively and reliably scare and deter birds from high-value crops.

@inproceedings{2017ASABE-Goel,
author={Goel, Shivam and Bhusal, Santosh and Taylor, Matthew E and Karkee, Manoj},
title={{Detection and localization of birds for Bird Deterrence using UAS}},
booktitle={{2017 {ASABE} Annual International Meeting}},
pages={1},
year={2017},
doi={10.13031/aim.201701288},
note={This is not a peer-reviewed article},
organization={American Society of Agricultural and Biological Engineers},
abstract={Cherry, grape, and blueberry growers lose around 80 million dollars annually to bird damage in the state of Washington alone. Growers of a wide range of crops have a critical need for a safe and cost-effective method for persistent bird deterrence, which would lead to significantly reduced production costs. The goal of this research is to build a completely autonomous Unmanned Aerial System (UAS) to deter birds from the blueberry fields and grape vineyards. In the effort to build the UAS, the most vital part of its implementation is the vision system. The primary objective of this paper is to build a system to detect and localize birds. To detect birds, background subtraction algorithms have been used and the performance of various background subtraction algorithms are measured. It is found out that ViBe, a background subtraction algorithm, performs best in the bird detection scenario and provides an accuracy of 63%. In the quest of improving the bird detection speed and obtaining it in real time, a split window technique is used to improve the detection speed by 13%. To estimate the distance of the detected bird, a stereo vision system is proposed. With our current system, an accurate measure of the distance of the object is possible from 2 to 7 meters with an error accuracy of 30 centimeters. The long-term goal is to combine the efforts of the paper to successfully create a completely autonomous Smart Scarecrow that can safely, effectively and reliably scare and deter birds from high-value crops.}
}

• James MacGlashan, Mark K. Ho, Robert Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. Interactive Learning from Policy-Dependent Human Feedback. In Proceedings of the 34th International Conferences on Machine Learning (ICML), August 2017. 25% acceptance rate

This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner’s current policy. We present empirical results that show this assumption to be false—whether human trainers give a positive or negative feedback for a decision is influenced by the learner’s current policy. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.

@inproceedings{2017ICML-Macglashan,
author={James MacGlashan and Mark K. Ho and Robert Loftin and Bei Peng and Guan Wang and David L. Roberts and Matthew E. Taylor and Michael L. Littman},
title={{Interactive Learning from Policy-Dependent Human Feedback}},
booktitle={{Proceedings of the 34th International Conferences on Machine Learning ({ICML})}},
month={August},
year={2017},
note={25% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning, Human-robot Interaction},
abstract={This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assumption to be false---whether human trainers give a positive or negative feedback for a decision is influenced by the learner's current policy. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.}
}

• Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI), August 2017. 26% acceptance rate

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.

@inproceedings{2017IJCAI-Rosenfeld,
author={Rosenfeld, Ariel and Taylor, Matthew E. and Kraus, Sarit},
title={{Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects}},
booktitle={{Proceedings of the 26th International Conference on Artificial Intelligence ({IJCAI})}},
month={August},
year={2017},
note={26% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
abstract={Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective
and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.}
}

• Zhaodong Wang and Matthew E. Taylor. Improving Reinforcement Learning with Confidence-Based Demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI), August 2017. 26% acceptance rate

Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent’s performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent’s learning algorithm or representation. The target agent then estimates the source agent’s policy and improves upon it. The key contribution of this work is to show that leveraging the target agent’s uncertainty in the source agent’s policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.

@inproceedings{2017IJCAI-Wang,
author={Wang, Zhaodong and Taylor, Matthew E.},
title={{Improving Reinforcement Learning with Confidence-Based Demonstrations}},
booktitle={{Proceedings of the 26th International Conference on Artificial Intelligence ({IJCAI})}},
month={August},
year={2017},
note={26% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
abstract={Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent's performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent's learning algorithm or representation. The target agent then estimates the source agent's policy and improves upon it. The key contribution of this work is to show that leveraging the target agent's uncertainty in the source agent's policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.}
}

2016

• Chris Cain, Anne Anderson, and Matthew E. Taylor. Content-Independent Classroom Gamification. In Proceedings of the ASEE’s 123rd Annual Conference & Exposition, New Orleans, LA, USA, June 2016.

This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.

@inproceedings{2016ASEE-Cain,
author={Chris Cain and Anne Anderson and Matthew E. Taylor},
title={{Content-Independent Classroom Gamification}},
booktitle={{Proceedings of the {ASEE}'s 123rd Annual Conference \& Exposition}},
month={June},
year={2016},
address={New Orleans, LA, USA},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Gamification, Motivation, Education},
abstract={This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.}
}

• Yang Hu and Matthew E. Taylor. Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility. In Proceedings of the ASEE’s 123rd Annual Conference & Exposition, New Orleans, LA, USA, June 2016.

@inproceedings{2016ASEE-Hu,
author={Yang Hu and Matthew E. Taylor},
title={{Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility}},
booktitle={{Proceedings of the {ASEE}'s 123rd Annual Conference \& Exposition}},
month={June},
year={2016},
address={New Orleans, LA, USA},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Intelligent Tutoring System, Multiple solutions},
abstract={Taking a Computer-Aided Design (CAD) class is a prerequisite for Mechanical Engineering freshmen at many universities, including at Washington State University. The traditional way to learn CAD software is to follow examples and exercises in a textbook. However, using written instruction is not always effective because textbooks usually support single strategy to construct a model. Missing even one detail may cause the student to become stuck, potentially leading to frustration.
To make the learning process easier and more interesting, we designed and implemented an intelligent tutorial system for an open source CAD program, FreeCAD, for the sake of teaching students some basic CAD skills (such as Boolean operations) to construct complex objects from multiple simple shapes. Instead of teaching a single method to construct a model, the program first automatically learns all possible ways to construct a model and then can teach the student to draw the 3D model in multiple ways. Previous research efforts have shown that learning multiple potential solutions can encourage students to develop the tools they need to solve new problems.
This study compares textbook learning with learning from two variants of our intelligent tutoring system. The textbook approach is considered the baseline. In the first tutorial variant, subjects were given minimal guidance and were asked to construct a model in multiple ways. Subjects in the second tutorial group were given two guided solutions to constructing a model and then asked to demonstrate the third solution when constructing the same model. Rather than directly providing instructions, participants in the second tutorial group were expected to independently explore and were only provided feedback when the program determined he/she had deviated too far from a potential solution. The three groups are compared by measuring the time needed to 1) successfully construct the same model in a testing phase, 2) use multiple methods to construct the same model in a testing phase, and 3) construct a novel model.}
}

• David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. Lifelong Learning for Disturbance Rejection on Mobile Robots. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016. 48% acceptance rate

No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.

@inproceedings{2016IROS-Isele,
author={Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and de la Cruz, Jr., Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
title={{Lifelong Learning for Disturbance Rejection on Mobile Robots}},
booktitle={{Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems ({IROS})}},
month={October},
year={2016},
note={48% acceptance rate},
video={https://youtu.be/u7pkhLx0FQ0},
bib2html_pubtype={Refereed Conference},
abstract={No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.}
}

• Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate

As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.

@inproceedings{2016AAMAS-Peng,
author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
title={{A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans}},
booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2016},
note={24.9% acceptance rate},
bib2html_pubtype={Refereed Conference},
abstract={As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.}
}

• Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova. Learning from Demonstration for Shaping through Inverse Reinforcement Learning. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate

Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.

@inproceedings{2016AAMAS-Suay,
author={Suay, Halit Bener and Brys, Tim and Taylor, Matthew E. and Chernova, Sonia},
title={{Learning from Demonstration for Shaping through Inverse Reinforcement Learning}},
booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2016},
note={24.9% acceptance rate},
bib2html_pubtype={Refereed Conference},
abstract={Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.}
}

• Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer. In Proceedings of the 25th International Conference on Artificial Intelligence (IJCAI), July 2016. 25% acceptance rate

Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

@inproceedings{2016IJCAI-Zhan,
author={Yusen Zhan and Haitham Bou Ammar and Matthew E. Taylor},
title={{Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer}},
booktitle={{Proceedings of the 25th International Conference on Artificial Intelligence ({IJCAI})}},
month={July},
year={2016},
note={25% acceptance rate},
bib2html_pubtype={Refereed Conference},
abstract={Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.}
}

2015

• Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), January 2015. 27% acceptance rate

@inproceedings{2015AAAI-BouAamar,
author={Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew E. Taylor},
title={{Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment}},
booktitle={{Proceedings of the 29th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={January},
year={2015},
note={27% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
}

• Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. Policy Transfer using Reward Shaping. In The 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2015. 25% acceptance rate

Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.

@inproceedings{2015AAMAS-Brys,
author={Tim Brys and Anna Harutyunyan and Matthew E. Taylor and Ann Now\'{e}},
title={{Policy Transfer using Reward Shaping}},
booktitle={{The 14th International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month={May},
year={2015},
note={25% acceptance rate},
bib2html_rescat={Reinforcement Learning, Transfer Learning},
bib2html_pubtype={Refereed Conference},
abstract={Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.},
}

• Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. Reinforcement Learning from Demonstration through Shaping. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015. 28.8% acceptance rate

Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.

@inproceedings{2015IJCAI-Brys,
author={Tim Brys and Anna Harutyunyan and Halit Bener Suay and Sonia Chernova and Matthew E. Taylor and Ann Now\'e},
title={{Reinforcement Learning from Demonstration through Shaping}},
booktitle={{Proceedings of the International Joint Conference on Artificial Intelligence ({IJCAI})}},
year={2015},
note={28.8% acceptance rate},
bib2html_rescat={Reinforcement Learning},
bib2html_pubtype={Refereed Conference},
abstract={Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.}
}

2014

• Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. Online Multi-Task Learning for Policy Gradient Methods. In Proceedings of the 31st International Conferences on Machine Learning (ICML), June 2014. 25% acceptance rate
@inproceedings{2014ICML-BouAmmar,
author={Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew E. Taylor},
title={{Online Multi-Task Learning for Policy Gradient Methods}},
booktitle={{Proceedings of the 31st International Conferences on Machine Learning ({ICML})}},
note={25% acceptance rate},
month={June},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
}

• Tim Brys, Anna Harutyunyan, Peter Vrancx, Matthew E. Taylor, Daniel Kudenko, and Ann Nowé. Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping. In Proceedings of the IEEE 2014 International Joint Conference on Neural Networks (IJCNN), July 2014. 59% acceptance rate
@inproceedings{2014IJCNN-Brys,
author={Tim Brys and Anna Harutyunyan and Peter Vrancx and Matthew E. Taylor and Daniel Kudenko and Ann Now\'{e}},
title={{Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping}},
booktitle={{Proceedings of the {IEEE} 2014 International Joint Conference on Neural Networks ({IJCNN})}},
month={July},
year={2014},
note={59% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Tim Brys, Ann Nowé, Daniel Kudenko, and Matthew E. Taylor. Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
@inproceedings{2014AAAI-Brys,
author={Tim Brys and Ann Now\'{e} and Daniel Kudenko and Matthew E. Taylor},
title={{Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence}},
booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={July},
year={2014},
note={28% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. An Autonomous Transfer Learning Algorithm for TD-Learners. In Proceedings of the 8th Hellenic Conference on Artificial Intelligence (SETN), May 2014. 50% acceptance rate
@inproceedings{2014SETN-Fachantidis,
author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
title={{An Autonomous Transfer Learning Algorithm for TD-Learners}},
booktitle={{Proceedings of the 8th Hellenic Conference on Artificial Intelligence ({SETN})}},
note={50% acceptance rate},
month={May},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Transfer Learning, Reinforcement Learning},
}

• Chris HolmesParker, Matthew E. Taylor, Adrian Agogino, and Kagan Tumer. CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning. In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT), August 2014. 43% acceptance rate
@inproceedings{2014IAT-HolmesParker,
author={Chris HolmesParker and Matthew E. Taylor and Adrian Agogino and Kagan Tumer},
title={{{CLEAN}ing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning}},
booktitle={{Proceedings of the 2014 {IEEE/WIC/ACM} International Conference on Intelligent Agent Technology ({IAT})}},
month={August},
year={2014},
note={43% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Robert Loftin, Bei Peng, James MacGlashan, Michael Littman, Matthew E. Taylor, David Roberts, and Jeff Huang. Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), August 2014.
@inproceedings{2014ROMAN-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Michael Littman and Matthew E. Taylor and David Roberts and Jeff Huang},
title={{Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies}},
booktitle={{Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication ({RO-MAN})}},
month={August},
year={2014},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Robert Loftin, Bei Peng, James MacGlashan, Machiael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
@inproceedings{2014AAAI-Loftin,
author={Robert Loftin and Bei Peng and James MacGlashan and Machiael L. Littman and Matthew E. Taylor and Jeff Huang and David L. Roberts},
title={{A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback}},
booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
month={July},
year={2014},
note={28% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

• Matthew E. Taylor and Lisa Torrey. Agents Teaching Agents in Reinforcement Learning (Nectar Abstract). In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), September 2014. Nectar Track, 45% acceptance rate
@inproceedings{2014ECML-Taylor,
author={Matthew E. Taylor and Lisa Torrey},
title={{Agents Teaching Agents in Reinforcement Learning (Nectar Abstract)}},
booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)}},
month={September},
year={2014},
note={Nectar Track, 45% acceptance rate},
bib2html_pubtype={Refereed Conference},
bib2html_rescat={Reinforcement Learning},
}

2013

• Haitham Bou Ammar, Decebal Constantin Mocanu, Matthew E. Taylor, Kurt Driessens, Karl Tuyls, and Gerhard Weiss. Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), September 2013. 25% acceptance rate
@inproceedings{ECML13-BouAamar,
author={Haitham Bou Ammar and Decebal Constantin Mocanu and Matthew E. Taylor and Kurt Driessens and Karl Tuyls and Gerhard Weiss},
title={{Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines}},
booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ({ECML PKDD})}},
month={September},
year = {2013},
note = {25% acceptance rate},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
}

• Tong Pham, Aly Tawfika, and Matthew E. Taylor. A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment. In Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations, September 2013.
@inproceedings{abm13-Pham,
author="Tong Pham and Aly Tawfika and Matthew E. Taylor",
title={{A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment}},
booktitle={{Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations}},
month="September",
year = {2013},
bib2html_rescat = {DCOP},
bib2html_pubtype = {Refereed Conference},
}

• Lisa Torrey and Matthew E. Taylor. Teaching on a Budget: Agents Advising Agents in Reinforcement Learning. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013. 23% acceptance rate

This paper introduces a teacher-student framework for reinforcement learning. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two experimental domains: Mountain Car and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

@inproceedings{AAMAS13-Torrey,
author="Lisa Torrey and Matthew E. Taylor",
title={{Teaching on a Budget: Agents Advising Agents in Reinforcement Learning}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2013},
note = {23% acceptance rate},
wwwnote = {<a href="aamas2013.cs.umn.edu/">AAMAS-13</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
abstract = "This paper introduces a teacher-student framework for reinforcement
learning. In this framework, a teacher agent instructs a student
agent by suggesting actions the student should take as it learns.
However, the teacher may only give such advice a limited number
of times. We present several novel algorithms that teachers can
use to budget their advice effectively, and we evaluate them in two
experimental domains: Mountain Car and Pac-Man. Our results
show that the same amount of advice, given at different moments,
can have different effects on student learning, and that teachers can
significantly affect student learning even when students use different
learning methods and state representations.",
}

2012

• Haitham Bou Ammar, Karl Tuyls, Matthew E. Taylor, Kurt Driessen, and Gerhard Weiss. Reinforcement Learning Transfer via Sparse Coding. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), June 2012. 20% acceptance rate

Although reinforcement learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (TrLSPI and TrFQI) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of environmental samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.

@inproceedings{12AAMAS-Haitham,
author="Haitham Bou Ammar and Karl Tuyls and Matthew E. Taylor and Kurt Driessen and Gerhard Weiss",
title={{Reinforcement Learning Transfer via Sparse Coding}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="June",
year = {2012},
note = {20% acceptance rate},
wwwnote = {<a href="http://aamas2012.webs.upv.es">AAMAS-12</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
abstract = "Although reinforcement learning (RL) has been successfully deployed
in a variety of tasks, learning speed remains a fundamental
problem for applying RL in complex environments. Transfer learning
aims to ameliorate this shortcoming by speeding up learning
through the adaptation of previously learned behaviors in similar
tasks. Transfer techniques often use an inter-task mapping, which
determines how a pair of tasks are related. Instead of relying on a
hand-coded inter-task mapping, this paper proposes a novel transfer
learning method capable of autonomously creating an inter-task
mapping by using a novel combination of sparse coding, sparse
projection learning and sparse Gaussian processes. We also propose
two new transfer algorithms (TrLSPI and TrFQI) based on
least squares policy iteration and fitted-Q-iteration. Experiments
not only show successful transfer of information between similar
tasks, inverted pendulum to cart pole, but also between two very
different domains: mountain car to cart pole. This paper empirically
shows that the learned inter-task mapping can be successfully
used to (1) improve the performance of a learned policy on a fixed
number of environmental samples, (2) reduce the learning times
needed by the algorithms to converge to a policy on a fixed number
of samples, and (3) converge faster to a near-optimal policy given
a large number of samples.",
}

2011

• Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. Integrating Reinforcement Learning with Human Demonstrations of Varying Ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
@inproceedings{11AAMAS-HAT-Taylor,
author="Matthew E. Taylor and Halit Bener Suay and Sonia Chernova",
title = {{Integrating Reinforcement Learning with Human Demonstrations of Varying Ability}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2011},
note = {22% acceptance rate},
wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning},
}

• Matthew E. Taylor, Brian Kulis, and Fei Sha. Metric Learning for Reinforcement Learning Agents. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
@inproceedings{11AAMAS-MetricLearn-Taylor,
author="Matthew E. Taylor and Brian Kulis and Fei Sha",
title = {{Metric Learning for Reinforcement Learning Agents}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2011},
note = {22% acceptance rate},
wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning},
}

• Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein, Gal Kaminka, Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. Taylor, Xuezhi Wang, Avishay Zilka, and Milind Tambe. ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
@inproceedings{11AAMAS-Tsai,
author={Jason Tsai and Natalie Fridman and Emma Bowring and Matthew Brown and Shira Epstein and Gal Kaminka and Stacy Marsella and Andrew Ogden and Inbal Rika and Ankur Sheel and Matthew E. Taylor and {Xuezhi Wang} and Avishay Zilka and Milind Tambe},
title = {{ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2011},
note = {22% acceptance rate},
wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
bib2html_pubtype = {Refereed Conference},
}

2010

• Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, and Milind Tambe. When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2010. 24% acceptance rate

Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimization problem where agents must cooperate to balance exploration and exploitation. Surprisingly, results show that increased teamwork can hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertainty penalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in our problem setting.

@inproceedings{AAMAS10-Taylor,
author = {Matthew E. Taylor and Manish Jain and Yanquin Jin and Makoto Yooko and Milind Tambe},
title = {{When Should There be a Me'' in Team''? {D}istributed Multi-Agent Optimization Under Uncertainty}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year = {2010},
note = {24% acceptance rate},
wwwnote = {<a href="http://www.cse.yorku.ca/AAMAS2010/index.php>AAMAS-10</a>},
abstract={Increasing teamwork between agents typically increases the
performance of a multi-agent system, at the cost of increased
communication and higher computational complexity. This work examines
joint actions in the context of a multi-agent optimization problem
where agents must cooperate to balance exploration and
exploitation. Surprisingly, results show that increased teamwork can
hurt agent performance, even when communication and computation costs
are ignored, which we term the team uncertainty penalty. This paper
introduces the above phenomena, analyzes it, and presents algorithms
to reduce the effect of the penalty in our problem setting.},
wwwnote={Supplemental material is available at <a href="http://teamcore.usc.edu/dcop/">http://teamcore.usc.edu/dcop/</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {DCOP},
}

• Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Bertrand A. Maher, Doug Burger, and Kathryn S. McKinley. Evolving Compiler Heuristics to Manage Communication and Contention. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI), July 2010. Nectar Track, 25% acceptance rate

As computer architectures become increasingly complex, hand-tuning compiler heuristics becomes increasingly tedious and time consuming for compiler developers. This paper presents a case study that uses a genetic algorithm to learn a compiler policy. The target policy implicitly balances communication and contention among processing elements of the TRIPS processor, a physically realized prototype chip. We learn specialized policies for individual programs as well as general policies that work well across all programs. We also employ a two-stage method that first classifies the code being compiled based on salient characteristics, and then chooses a specialized policy based on that classification. <br> This work is particularly interesting for the AI community because it 1 emphasizes the need for increased collaboration between AI researchers and researchers from other branches of computer science and 2 discusses a machine learning setup where training on the custom hardware requires weeks of training, rather than the more typical minutes or hours.

@inproceedings(AAAI10-Nectar-taylor,
author="Matthew E. Taylor and Katherine E. Coons and Behnam Robatmili and Bertrand A. Maher and Doug Burger and Kathryn S. McKinley",
title={{Evolving Compiler Heuristics to Manage Communication and Contention}},
note = "Nectar Track, 25% acceptance rate",
booktitle={{Proceedings of the Twenty-Fourth Conference on Artificial Intelligence ({AAAI})}},
month="July",year="2010",
abstract="
As computer architectures become increasingly complex, hand-tuning
compiler heuristics becomes increasingly tedious and time consuming
for compiler developers. This paper presents a case study that uses a
genetic algorithm to learn a compiler policy. The target policy
implicitly balances communication and contention among processing
elements of the TRIPS processor, a physically realized prototype chip.
We learn specialized policies for individual programs as well as
general policies that work well across all programs. We also employ a
two-stage method that first classifies the code being compiled based
on salient characteristics, and then chooses a specialized policy
based on that classification.
<br>
This work is particularly interesting for the AI community because it
1 emphasizes the need for increased collaboration between AI
researchers and researchers from other branches of computer science
and 2 discusses a machine learning setup where training on the custom
hardware requires weeks of training, rather than the more typical
minutes or hours.",
wwwnote={<a href="http://www.aaai.org/Conferences/AAAI/aaai10.php">AAAI-2010</a>. This paper is based on results presented in our earlier <a href="b2hd-PACT08-coons.html">PACT-08 paper</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Genetic Algorithms},
bib2html_funding = {NSF, DARPA}
)

2009

• Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI), July 2009. 26% acceptance rate

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

@inproceedings(IJCAI09-Jain,
author="Manish Jain and Matthew E. Taylor and Makoto Yokoo and Milind Tambe",
title={{{DCOP}s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
booktitle={{Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence ({IJCAI})}},
month="July",
year= "2009",
note = {26% acceptance rate},
wwwnote={<a href="http://www.ijcai-09.org">IJCAI-2009</a>},
abstract={Buoyed by recent successes in the area of distributed
constraint optimization problems (DCOPs), this paper addresses
challenges faced when applying DCOPs to real-world domains. Three
fundamental challenges must be addressed for a class of real-world
domains, requiring novel DCOP algorithms. First, agents may not
know the payoff matrix and must explore the environment to
determine rewards associated with variable settings. Second,
agents may need to maximize total accumulated reward rather than
instantaneous final reward. Third, limited time horizons disallow
exhaustive exploration of the environment. We propose and
implement a set of novel algorithms that combine
decision-theoretic exploration approaches with DCOP-mandated
coordination. In addition to simulation results, we implement
these algorithms on robots, deploying DCOPs on a distributed
mobile sensor network.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {DCOP, Robotics},
bib2html_funding = {DARPA}
)

• Pradeep Varakantham, Jun-young Kwak, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping. In Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling (ICAPS), September 2009. 34% acceptance rate

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

@inproceedings(ICAPS09-Varakantham,
author="Pradeep Varakantham and Jun-young Kwak and Matthew E. Taylor and Janusz Marecki and Paul Scerri and Milind Tambe",
title={{Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping}},
booktitle={{Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling ({ICAPS})}},
month="September",
year= "2009",
note = {34% acceptance rate},
wwwnote={<a href="http://icaps09.uom.gr">ICAPS-2009</a>},
abstract={ Distributed POMDPs provide an expressive framework for
modeling multiagent collaboration problems, but NEXP-Complete
complexity hinders their scalability and application in real-world
domains. This paper introduces a subclass of distributed POMDPs,
and TREMOR, an algorithm to solve such distributed POMDPs. The
primary novelty of TREMOR is that agents plan individually with a
single agent POMDP solver and use social model shaping to
implicitly coordinate with other agents. Experiments demonstrate
that TREMOR can provide solutions orders of magnitude faster than
existing algorithms while achieving comparable, or even superior,
solution quality.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Distributed POMDPs},
bib2html_funding = {ARMY}
)

2008

• Katherine K. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Kathryn McKinley, and Doug Burger. Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning. In Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques (PACT), pages 32-42, October 2008. 19% acceptance rate

Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) processors, in which instructions communicate with one another directly on a distributed substrate, give the compiler control over communication overheads at a fine granularity. Prior work shows that compilers can effectively reduce fine-grained communication overheads in EDGE architectures using a spatial instruction placement algorithm with a heuristic-based cost function. While this algorithm is effective, the cost function must be painstakingly tuned. Heuristics tuned to perform well across a variety of applications leave users with little ability to tune performance-critical applications, yet we find that the best placement heuristics vary significantly with the application. <p> First, we suggest a systematic feature selection method that reduces the feature set size based on the extent to which features affect performance. To automatically discover placement heuristics, we then use these features as input to a reinforcement learning technique, called Neuro-Evolution of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve neural networks. We show that NEAT outperforms simulated annealing, the most commonly used optimization technique for instruction placement. We use NEAT to learn general heuristics that are as effective as hand-tuned heuristics, but we find that improving over highly hand-tuned general heuristics is difficult. We then suggest a hierarchical approach to machine learning that classifies segments of code with similar characteristics and learns heuristics for these classes. This approach performs closer to the specialized heuristics. Together, these results suggest that learning compiler heuristics may benefit from both improved feature selection and classification.

@inproceedings{PACT08-coons,
author="Katherine K. Coons and Behnam Robatmili and Matthew E. Taylor and Bertrand A. Maher and Kathryn McKinley and Doug Burger",
title={{Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning}},
booktitle={{Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques ({PACT})}},
month="October",
year="2008",
pages="32--42",
note = {19% acceptance rate},
wwwnote={<a href="http://www.eecg.toronto.edu/pact/">PACT-2008</a>},
abstract = {Communication overheads are one of the fundamental challenges in a
multiprocessor system. As the number of processors on a chip increases,
communication overheads and the distribution of computation and data
become increasingly important performance factors. Explicit Dataflow
Graph Execution (EDGE) processors, in which instructions communicate
with one another directly on a distributed substrate, give the compiler
control over communication overheads at a fine granularity. Prior work
shows that compilers can effectively reduce fine-grained communication
overheads in EDGE architectures using a spatial instruction placement
algorithm with a heuristic-based cost function. While this algorithm is
effective, the cost function must be painstakingly tuned. Heuristics tuned
to perform well across a variety of applications leave users with little
ability to tune performance-critical applications, yet we find that the
best placement heuristics vary significantly with the application.
<p>
First, we suggest a systematic feature selection method that reduces the
feature set size based on the extent to which features affect performance.
To automatically discover placement heuristics, we then use these features
as input to a reinforcement learning technique, called Neuro-Evolution
of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve
neural networks. We show that NEAT outperforms simulated annealing, the
most commonly used optimization technique for instruction placement. We
use NEAT to learn general heuristics that are as effective as hand-tuned
heuristics, but we find that improving over highly hand-tuned general
heuristics is difficult. We then suggest a hierarchical approach
to machine learning that classifies segments of code with similar
characteristics and learns heuristics for these classes. This approach
performs closer to the specialized heuristics. Together, these results
suggest that learning compiler heuristics may benefit from both improved
feature selection and classification.
},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Autonomic Computing, Machine Learning in Practice},
}

• Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Transfer Learning and Intelligence: an Argument and Approach. In Proceedings of the First Conference on Artificial General Intelligence (AGI), March 2008. 50% acceptance rate

In order to claim fully general intelligence in an autonomous agent, the ability to learn is one of the most central capabilities. Classical machine learning techniques have had many significant empirical successes, but large real-world problems that are of interest to generally intelligent agents require learning much faster (with much less training experience) than is currently possible. This paper presents transfer learning, where knowledge from a learned task can be used to significantly speed up learning in a novel task, as the key to achieving the learning capabilities necessary for general intelligence. In addition to motivating the need for transfer learning in an intelligent agent, we introduce a novel method for selecting types of tasks to be used for transfer and empirically demonstrate that such a selection can lead to significant increases in training speed in a two-player game.

@inproceedings(AGI08-taylor,
author="Matthew E. Taylor and Gregory Kuhlmann and Peter Stone",
title={{Transfer Learning and Intelligence: an Argument and Approach}},
booktitle={{Proceedings of the First Conference on Artificial General Intelligence ({AGI})}},
month="March",
year="2008",
abstract="In order to claim fully general intelligence in an
autonomous agent, the ability to learn is one of the most
central capabilities. Classical machine learning techniques
have had many significant empirical successes, but large
real-world problems that are of interest to generally
intelligent agents require learning much faster (with much
less training experience) than is currently possible. This
paper presents transfer learning, where knowledge
from a learned task can be used to significantly speed up
learning in a novel task, as the key to achieving the
learning capabilities necessary for general intelligence. In
addition to motivating the need for transfer learning in an
intelligent agent, we introduce a novel method for selecting
types of tasks to be used for transfer and empirically
demonstrate that such a selection can lead to significant
increases in training speed in a two-player game.",
note = {50% acceptance rate},
wwwnote={<a href="http://agi-08.org/">AGI-2008</a><br> A video
of talk is available <a
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning},
bib2html_funding = {NSF, DARPA},
)

• Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Autonomous Transfer for Reinforcement Learning. In Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 283-290, May 2008. 22% acceptance rate

Recent work in transfer learning has succeeded in making reinforcement learning algorithms more efficient by incorporating knowledge from previous tasks. However, such methods typically must be provided either a full model of the tasks or an explicit relation mapping one task into the other. An autonomous agent may not have access to such high-level information, but would be able to analyze its experience to find similarities between tasks. In this paper we introduce Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent’s experience. We empirically demonstrate that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks. Additionally, we demonstrate that our method may also assist with the difficult problem of task selection for transfer.

@inproceedings{AAMAS08-taylor,
author="Matthew E. Taylor and Gregory Kuhlmann and Peter Stone",
title={{Autonomous Transfer for Reinforcement Learning}},
booktitle={{Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="May",
year="2008",
pages="283--290",
abstract={Recent work in transfer learning has succeeded in
making reinforcement learning algorithms more
efficient by incorporating knowledge from previous
tasks. However, such methods typically must be
provided either a full model of the tasks or an
explicit relation mapping one task into the
other. An autonomous agent may not have access to
such high-level information, but would be able to
analyze its experience to find similarities between
tasks. In this paper we introduce Modeling
Approximate State Transitions by Exploiting
Regression (MASTER), a method for automatically
learning a mapping from one task to another through
an agent's experience. We empirically demonstrate
that such learned relationships can significantly
improve the speed of a reinforcement learning
algorithm in a series of Mountain Car
tasks. Additionally, we demonstrate that our method
may also assist with the difficult problem of task
selection for transfer.},
note = {22% acceptance rate},
wwwnote={<a href="http://gaips.inesc-id.pt/aamas2008/">AAMAS-2008</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {DARPA, NSF}
}

• Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. Transferring Instances for Model-Based Reinforcement Learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 488-505, September 2008. 19% acceptance rate

Recent work in transfer learning has succeeded in Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. Transfer learning methods have made progress reducing sample complexity, but they have primarily been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample efficiency and asymptotic performance of a model-based algorithm when learning in a continuous state space. Additionally, we conduct experiments to test the limits of TIMBREL’s effectiveness.

@inproceedings(ECML08-taylor,
author="Matthew E. Taylor and Nicholas K. Jong and Peter Stone",
title={{Transferring Instances for Model-Based Reinforcement Learning}},
booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ({ECML PKDD})}},
pages="488--505",
month="September",
year= "2008",
note = {19% acceptance rate},
wwwnote={<a href="http://www.ecmlpkdd2008.org/">ECML-2008</a>},
abstract={Recent work in transfer learning has succeeded in
Reinforcement learning agents typically require a significant
amount of data before performing well on complex tasks. Transfer
learning methods have made progress reducing sample complexity,
but they have primarily been applied to model-free learning
methods, not more data-efficient model-based learning
methods. This paper introduces TIMBREL, a novel method capable of
transferring information effectively into a model-based
reinforcement learning algorithm. We demonstrate that TIMBREL can
significantly improve the sample efficiency and asymptotic
performance of a model-based algorithm when learning in a
continuous state space. Additionally, we conduct experiments to
test the limits of TIMBREL's effectiveness.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Transfer Learning, Reinforcement Learning, Planning},
bib2html_funding = {NSF, DARPA}
)

2007

• Mazda Ahmadi, Matthew E. Taylor, and Peter Stone. IFSA: Incremental Feature-Set Augmentation for Reinforcement Learning Tasks. In Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1120-1127, May 2007. 22% acceptance rate, Finalist for Best Student Paper

Reinforcement learning is a popular and successful framework for many agent-related problems because only limited environmental feedback is necessary for learning. While many algorithms exist to learn effective policies in such problems, learning is often used to solve real world problems, which typically have large state spaces, and therefore suffer from the “curse of dimensionality.” One effective method for speeding-up reinforcement learning algorithms is to leverage expert knowledge. In this paper, we propose a method for dynamically augmenting the agent’s feature set in order to speed up value-function-based reinforcement learning. The domain expert divides the feature set into a series of subsets such that a novel problem concept can be learned from each successive subset. Domain knowledge is also used to order the feature subsets in order of their importance for learning. Our algorithm uses the ordered feature subsets to learn tasks significantly faster than if the entire feature set is used from the start. Incremental Feature-Set Augmentation (IFSA) is fully implemented and tested in three different domains: Gridworld, Blackjack and RoboCup Soccer Keepaway. All experiments show that IFSA can significantly speed up learning and motivates the applicability of this novel RL method.

@inproceedings{AAMAS07-ahmadi,
author="Mazda Ahmadi and Matthew E. Taylor and Peter Stone",
title={{{IFSA}: Incremental Feature-Set Augmentation for Reinforcement Learning Tasks}},
booktitle={{Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
pages="1120--1127",
month="May",
year="2007",
abstract={
Reinforcement learning is a popular and successful framework for
many agent-related problems because only limited environmental
feedback is necessary for learning. While many algorithms exist to
learn effective policies in such problems, learning is often
used to solve real world problems, which typically have large state
spaces, and therefore suffer from the curse of dimensionality.''
One effective method for speeding-up reinforcement learning algorithms
is to leverage expert knowledge. In this paper, we propose a method
for dynamically augmenting the agent's feature set in order to
speed up value-function-based reinforcement learning. The domain
expert divides the feature set into a series of subsets such that a
novel problem concept can be learned from each successive
subset. Domain knowledge is also used to order the feature subsets in
order of their importance for learning. Our algorithm uses the
ordered feature subsets to learn tasks significantly faster than if
the entire feature set is used from the start. Incremental
Feature-Set Augmentation (IFSA) is fully implemented and tested in
three different domains: Gridworld, Blackjack and RoboCup Soccer
Keepaway. All experiments show that IFSA can significantly speed up
learning and motivates the applicability of this novel RL method.},
note = {22% acceptance rate, Finalist for Best Student Paper},
wwwnote={<span align="left" style="color: red; font-weight: bold">Best Student Paper Nomination</span> at <a href="http://www.aamas2007.nl/">AAMAS-2007</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning},
bib2html_funding = {DARPA, NSF,ONR},
}

• Matthew E. Taylor, Cynthia Matuszek, Pace Reagan Smith, and Michael Witbrock. Guiding Inference with Policy Search Reinforcement Learning. In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS), May 2007. 52% acceptance rate

Symbolic reasoning is a well understood and effective approach to handling reasoning over formally represented knowledge; however, simple symbolic inference systems necessarily slow as complexity and ground facts grow. As automated approaches to ontology-building become more prevalent and sophisticated, knowledge base systems become larger and more complex, necessitating techniques for faster inference. This work uses reinforcement learning, a statistical machine learning technique, to learn control laws which guide inference. We implement our learning method in ResearchCyc, a very large knowledge base with millions of assertions. A large set of test queries, some of which require tens of thousands of inference steps to answer, can be answered faster after training over an independent set of training queries. Furthermore, this learned inference module outperforms ResearchCyc’s integrated inference module, a module that has been hand-tuned with considerable effort.

@inproceedings{FLAIRS07-taylor-inference,
author="Matthew E. Taylor and Cynthia Matuszek and Pace Reagan Smith and Michael Witbrock",
title={{Guiding Inference with Policy Search Reinforcement Learning}},
booktitle={{Proceedings of the Twentieth International FLAIRS Conference {(FLAIRS})}},
month="May",
year="2007",
abstract="Symbolic reasoning is a well understood and
effective approach to handling reasoning over
formally represented knowledge; however, simple
symbolic inference systems necessarily slow as
complexity and ground facts grow. As automated
approaches to ontology-building become more
prevalent and sophisticated, knowledge base systems
become larger and more complex, necessitating
techniques for faster inference. This work uses
reinforcement learning, a statistical machine
learning technique, to learn control laws which
guide inference. We implement our learning method in
ResearchCyc, a very large knowledge base with
millions of assertions. A large set of test queries,
some of which require tens of thousands of inference
steps to answer, can be answered faster after
training over an independent set of training
queries. Furthermore, this learned inference module
outperforms ResearchCyc's integrated inference
module, a module that has been hand-tuned with
considerable effort.",
note = {52% acceptance rate},
wwwnote={<a href="http://www.cise.ufl.edu/~ddd/FLAIRS/flairs2007/">FLAIRS-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Inference, Machine Learning in Practice},
bib2html_funding = {DARPA},
}

• Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock. Autonomous Classification of Knowledge into an Ontology. In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS), May 2007. 52% acceptance rate

Ontologies are an increasingly important tool in knowledge representation, as they allow large amounts of data to be related in a logical fashion. Current research is concentrated on automatically constructing ontologies, merging ontologies with different structures, and optimal mechanisms for ontology building; in this work we consider the related, but distinct, problem of how to automatically determine where to place new knowledge into an existing ontology. Rather than relying on human knowledge engineers to carefully classify knowledge, it is becoming increasingly important for machine learning techniques to automate such a task. Automation is particularly important as the rate of ontology building via automatic knowledge acquisition techniques increases. This paper compares three well-established machine learning techniques and shows that they can be applied successfully to this knowledge placement task. Our methods are fully implemented and tested in the Cyc knowledge base system.

@inproceedings{FLAIRS07-taylor-ontology,
author="Matthew E. Taylor and Cynthia Matuszek and Bryan Klimt and Michael Witbrock",
title={{Autonomous Classification of Knowledge into an Ontology}},
booktitle={{Proceedings of the Twentieth International FLAIRS Conference ({FLAIRS})}},
month="May",
year="2007",
abstract="Ontologies are an increasingly important tool in
knowledge representation, as they allow large amounts of data
to be related in a logical fashion. Current research is
concentrated on automatically constructing ontologies, merging
ontologies with different structures, and optimal mechanisms
for ontology building; in this work we consider the related,
but distinct, problem of how to automatically determine where
to place new knowledge into an existing ontology. Rather than
relying on human knowledge engineers to carefully classify
knowledge, it is becoming increasingly important for machine
learning techniques to automate such a task. Automation is
particularly important as the rate of ontology building via
automatic knowledge acquisition techniques increases. This
paper compares three well-established machine learning
techniques and shows that they can be applied successfully to
this knowledge placement task. Our methods are fully
implemented and tested in the Cyc knowledge base system.",
note = {52% acceptance rate},
wwwnote={<a href="http://www.cise.ufl.edu/~ddd/FLAIRS/flairs2007/">FLAIRS-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Ontologies, Machine Learning in Practice},
bib2html_funding = {DARPA},
}

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 156-163, May 2007. 22% acceptance rate

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods ({\sc tvitm-ps}) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that {\sc tvitm-ps} can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that {\sc tvitm-ps} still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

@inproceedings{AAMAS07-taylor,
author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
title={{Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning}},
booktitle={{Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
pages="156--163",
month="May",
year="2007",
abstract={ The ambitious goal of transfer learning is to
accelerate learning on a target task after training on
a different, but related, source task. While many past
transfer methods have focused on transferring
value-functions, this paper presents a method for
transferring policies across tasks with different
state and action spaces. In particular, this paper
utilizes transfer via inter-task mappings for policy
search methods ({\sc tvitm-ps}) to construct a
transfer functional that translates a population of
neural network policies trained via policy search from
a source task to a target task. Empirical results in
robot soccer Keepaway and Server Job Scheduling show
that {\sc tvitm-ps} can markedly reduce learning time
when full inter-task mappings are available. The
results also demonstrate that {\sc tvitm-ps} still
succeeds when given only incomplete inter-task
mappings. Furthermore, we present a novel method for
learning such mappings when they are not
available, and give results showing they perform
comparably to hand-coded mappings. },
note = {22% acceptance rate},
wwwnote={<a href="http://www.aamas2007.nl/">AAMAS-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {DARPA, NSF}
}

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison. In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI), pages 1675-1678, July 2007. Nectar Track, 38% acceptance rate

Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weaknesses are available. This paper summarizes a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. The results from this study help isolate the factors critical to the performance of each learning method and yield insights into their general strengths and weaknesses.

@inproceedings(AAAI07-taylor,
author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
title={{Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison}},
pages="1675--1678",
booktitle={{Proceedings of the Twenty-Second Conference on Artificial Intelligence ({AAAI})}},
month="July",
year="2007",
abstract="Reinforcement learning (RL) methods have become
popular in recent years because of their ability to solve
complex tasks with minimal feedback. Both genetic algorithms
(GAs) and temporal difference (TD) methods have proven
effective at solving difficult RL problems, but few rigorous
comparisons have been conducted. Thus, no general guidelines
describing the methods' relative strengths and weaknesses are
available. This paper summarizes a detailed empirical
comparison between a GA and a TD method in Keepaway, a
standard RL benchmark domain based on robot soccer. The
results from this study help isolate the factors critical to
the performance of each learning method and yield insights
into their general strengths and weaknesses.",
note = {Nectar Track, 38% acceptance rate},
wwwnote={<a href="http://www.aaai.org/Conferences/National/2007/aaai07.html">AAAI-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Genetic Algorithms},
bib2html_funding = {NSF, DARPA}
)

• Matthew E. Taylor and Peter Stone. Cross-Domain Transfer for Reinforcement Learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML), June 2007. 29% acceptance rate

A typical goal for transfer learning algorithms is to utilize knowledge gained in a source task to learn a target task faster. Recently introduced transfer methods in reinforcement learning settings have shown considerable promise, but they typically transfer between pairs of very similar tasks. This work introduces Rule Transfer, a transfer algorithm that first learns rules to summarize a source task policy and then leverages those rules to learn faster in a target task. This paper demonstrates that Rule Transfer can effectively speed up learning in Keepaway, a benchmark RL problem in the robot soccer domain, based on experience from source tasks in the gridworld domain. We empirically show, through the use of three distinct transfer metrics, that Rule Transfer is effective across these domains.

@inproceedings(ICML07-taylor,
author="Matthew E. Taylor and Peter Stone",
title={{Cross-Domain Transfer for Reinforcement Learning}},
booktitle={{Proceedings of the Twenty-Fourth International Conference on Machine Learning ({ICML})}},
month="June",
year="2007",
abstract="A typical goal for transfer learning algorithms is
to utilize knowledge gained in a source task to learn a
target task faster. Recently introduced transfer methods in
reinforcement learning settings have shown considerable
promise, but they typically transfer between pairs of very
similar tasks. This work introduces Rule Transfer, a
transfer algorithm that first learns rules to summarize a
source task policy and then leverages those rules to learn
faster in a target task. This paper demonstrates that Rule
Transfer can effectively speed up learning in Keepaway, a
benchmark RL problem in the robot soccer domain, based on
experience from source tasks in the gridworld domain. We
empirically show, through the use of three distinct transfer
metrics, that Rule Transfer is effective across these
domains.",
note = {29% acceptance rate},
wwwnote={<a href="http://oregonstate.edu/conferences/icml2007">ICML-2007</a>},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {NSF, DARPA} ,
)

2006

• Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 1321-28, July 2006. 46% acceptance rate, Best Paper Award in GA track (of 85 submissions)

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods’ relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT~\cite{stanley:ec02evolving}, a GA that evolves neural networks, with Sarsa~\cite{Rummery94,Singh96}, a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

@inproceedings{GECCO06-taylor,
author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
title={{Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning}},
booktitle={{Proceedings of the Genetic and Evolutionary Computation Conference ({GECCO})}},
month="July",
year="2006",
pages="1321--28",
abstract={
Both genetic algorithms (GAs) and temporal
difference (TD) methods have proven effective at
solving reinforcement learning (RL) problems.
However, since few rigorous empirical comparisons
have been conducted, there are no general guidelines
describing the methods' relative strengths and
weaknesses. This paper presents the results of a
detailed empirical comparison between a GA and a TD
method in Keepaway, a standard RL benchmark domain
based on robot soccer. In particular, we compare
the performance of NEAT~\cite{stanley:ec02evolving},
a GA that evolves neural networks, with
Sarsa~\cite{Rummery94,Singh96}, a popular TD method.
The results demonstrate that NEAT can learn better
policies in this task, though it requires more
evaluations to do so. Additional experiments in two
variations of Keepaway demonstrate that Sarsa learns
better policies when the task is fully observable
and NEAT learns faster when the task is
deterministic. Together, these results help isolate
the factors critical to the performance of each
method and yield insights into their general
strengths and weaknesses.
},
note = {46% acceptance rate, Best Paper Award in GA track (of 85 submissions)},
wwwnote={<span align="left" style="color: red; font-weight: bold">Best Paper Award</span> (Genetic Algorithms Track) at <a href="http://www.sigevo.org/gecco-2006/">GECCO-2006</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Genetic Algorithms, Machine Learning in Practice},
bib2html_funding = {NSF, DARPA}
}

2005

• Matthew E. Taylor and Peter Stone. Behavior Transfer for Value-Function-Based Reinforcement Learning. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 53-59, July 2005. 25% acceptance rate.

Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found very slow in practice. A key feature of TD methods is that they represent policies in terms of value functions. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up TD learning by transferring the learned value function from one task to a second related task. We present experimental results showing that autonomous learners are able to learn one multiagent task and then use behavior transfer to markedly reduce the total training time for a more complex task.

@inproceedings{AAMAS05-taylor,
author="Matthew E. Taylor and Peter Stone",
title={{Behavior Transfer for Value-Function-Based Reinforcement Learning}},
booktitle={{Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
month="July",
year="2005",
pages="53--59",
abstract={
Temporal difference (TD) learning
methods have become popular
reinforcement learning techniques in recent years. TD
methods have had some experimental successes and have
been shown to exhibit some desirable properties in
theory, but have often been found very slow in
practice. A key feature of TD methods is that they
represent policies in terms of value functions. In
this paper we introduce \emph{behavior transfer}, a
novel approach to speeding up TD learning by
transferring the learned value function from one task
to a second related task. We present experimental
results showing that autonomous learners are able to
learn one multiagent task and then use behavior
transfer to markedly reduce the total training time
for a more complex task.
},
note = {25% acceptance rate.},
wwwnote={<a href="http://www.aamas2005.nl/">AAMAS-2005</a>.<br> Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {DARPA, NSF},
}

• Matthew E. Taylor, Peter Stone, and Yaxin Liu. Value Functions for RL-Based Behavior Transfer: A Comparative Study. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI), July 2005. 18% acceptance rate.

Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generalizing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.

@inproceedings(AAAI05-taylor,
author="Matthew E. Taylor and Peter Stone and Yaxin Liu",
title={{Value Functions for {RL}-Based Behavior Transfer: A Comparative Study}},
booktitle={{Proceedings of the Twentieth National Conference on Artificial Intelligence ({AAAI})}},
month="July",
year="2005",
abstract={
Temporal difference (TD) learning methods have
become popular reinforcement learning techniques in
recent years. TD methods, relying on function
approximators to generalize learning to novel
situations, have had some experimental successes and
have been shown to exhibit some desirable properties
in theory, but have often been found slow in
practice. This paper presents methods for further
generalizing across tasks, thereby speeding
up learning, via a novel form of behavior
transfer. We compare learning on a complex task
with three function approximators, a CMAC, a neural
network, and an RBF, and demonstrate that behavior
transfer works well with all three. Using behavior
transfer, agents are able to learn one task and then
markedly reduce the time it takes to learn a more
complex task. Our algorithms are fully implemented
and tested in the RoboCup-soccer keepaway domain.
},
note = {18% acceptance rate.},
wwwnote={<a href="http://www.aaai.org/Conferences/National/2005/aaai05.html">AAAI-2005</a>. <br> Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.},
bib2html_pubtype = {Refereed Conference},
bib2html_rescat = {Reinforcement Learning, Transfer Learning},
bib2html_funding = {NSF, DARPA}
)