Intelligent Robot Learning Laboratory (IRL Lab) Conference Papers

2017

  • Salam El Bsat, Haitham Bou Ammar, and Matthew E. Taylor. Scalable Multitask Policy Gradient Reinforcement Learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017. 25% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.

    @inproceedings{2017AAAI-ElBsat,
    author={El Bsat, Salam and Bou Ammar, Haitham and Taylor, Matthew E.},
    title={{Scalable Multitask Policy Gradient Reinforcement Learning}},
    booktitle={{Proceedings of the 31st {AAAI} Conference on Artificial Intelligence ({AAAI})}},
    month={February},
    year={2017},
    note={25% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning, Transfer Learning},
    abstract={Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.}
    }

2016

  • Chris Cain, Anne Anderson, and Matthew E. Taylor. Content-Independent Classroom Gamification. In Proceedings of the ASEE’s 123rd Annual Conference & Exposition, New Orleans, LA, USA, June 2016.
    [BibTeX] [Abstract] [Download PDF]

    This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.

    @inproceedings{2016ASEE-Cain,
    author={Chris Cain and Anne Anderson and Matthew E. Taylor},
    title={{Content-Independent Classroom Gamification}},
    booktitle={{Proceedings of the {ASEE}'s 123rd Annual Conference \& Exposition}},
    month={June},
    year={2016},
    address={New Orleans, LA, USA},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Gamification, Motivation, Education},
    abstract={This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.}
    }

  • Yang Hu and Matthew E. Taylor. Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility. In Proceedings of the ASEE’s 123rd Annual Conference & Exposition, New Orleans, LA, USA, June 2016.
    [BibTeX] [Abstract] [Download PDF]

    Taking a Computer-Aided Design (CAD) class is a prerequisite for Mechanical Engineering freshmen at many universities, including at Washington State University. The traditional way to learn CAD software is to follow examples and exercises in a textbook. However, using written instruction is not always effective because textbooks usually support single strategy to construct a model. Missing even one detail may cause the student to become stuck, potentially leading to frustration. To make the learning process easier and more interesting, we designed and implemented an intelligent tutorial system for an open source CAD program, FreeCAD, for the sake of teaching students some basic CAD skills (such as Boolean operations) to construct complex objects from multiple simple shapes. Instead of teaching a single method to construct a model, the program first automatically learns all possible ways to construct a model and then can teach the student to draw the 3D model in multiple ways. Previous research efforts have shown that learning multiple potential solutions can encourage students to develop the tools they need to solve new problems. This study compares textbook learning with learning from two variants of our intelligent tutoring system. The textbook approach is considered the baseline. In the first tutorial variant, subjects were given minimal guidance and were asked to construct a model in multiple ways. Subjects in the second tutorial group were given two guided solutions to constructing a model and then asked to demonstrate the third solution when constructing the same model. Rather than directly providing instructions, participants in the second tutorial group were expected to independently explore and were only provided feedback when the program determined he/she had deviated too far from a potential solution. The three groups are compared by measuring the time needed to 1) successfully construct the same model in a testing phase, 2) use multiple methods to construct the same model in a testing phase, and 3) construct a novel model.

    @inproceedings{2016ASEE-Hu,
    author={Yang Hu and Matthew E. Taylor},
    title={{Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility}},
    booktitle={{Proceedings of the {ASEE}'s 123rd Annual Conference \& Exposition}},
    month={June},
    year={2016},
    address={New Orleans, LA, USA},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Intelligent Tutoring System, Multiple solutions},
    abstract={Taking a Computer-Aided Design (CAD) class is a prerequisite for Mechanical Engineering freshmen at many universities, including at Washington State University. The traditional way to learn CAD software is to follow examples and exercises in a textbook. However, using written instruction is not always effective because textbooks usually support single strategy to construct a model. Missing even one detail may cause the student to become stuck, potentially leading to frustration.
    To make the learning process easier and more interesting, we designed and implemented an intelligent tutorial system for an open source CAD program, FreeCAD, for the sake of teaching students some basic CAD skills (such as Boolean operations) to construct complex objects from multiple simple shapes. Instead of teaching a single method to construct a model, the program first automatically learns all possible ways to construct a model and then can teach the student to draw the 3D model in multiple ways. Previous research efforts have shown that learning multiple potential solutions can encourage students to develop the tools they need to solve new problems.
    This study compares textbook learning with learning from two variants of our intelligent tutoring system. The textbook approach is considered the baseline. In the first tutorial variant, subjects were given minimal guidance and were asked to construct a model in multiple ways. Subjects in the second tutorial group were given two guided solutions to constructing a model and then asked to demonstrate the third solution when constructing the same model. Rather than directly providing instructions, participants in the second tutorial group were expected to independently explore and were only provided feedback when the program determined he/she had deviated too far from a potential solution. The three groups are compared by measuring the time needed to 1) successfully construct the same model in a testing phase, 2) use multiple methods to construct the same model in a testing phase, and 3) construct a novel model.}
    }

  • David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. Lifelong Learning for Disturbance Rejection on Mobile Robots. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016. 48% acceptance rate
    [BibTeX] [Abstract] [Download PDF] [Video]

    No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.

    @inproceedings{2016IROS-Isele,
    author={Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and de la Cruz, Jr., Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
    title={{Lifelong Learning for Disturbance Rejection on Mobile Robots}},
    booktitle={{Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems ({IROS})}},
    month={October},
    year={2016},
    note={48% acceptance rate},
    video={https://youtu.be/u7pkhLx0FQ0},
    bib2html_pubtype={Refereed Conference},
    abstract={No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.}
    }

  • Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate
    [BibTeX] [Abstract] [Download PDF] [Video]

    As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.

    @inproceedings{2016AAMAS-Peng,
    author={Bei Peng and James MacGlashan and Robert Loftin and Michael L. Littman and David L. Roberts and Matthew E. Taylor},
    title={{A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans}},
    booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month={May},
    year={2016},
    note={24.9% acceptance rate},
    video={https://www.youtube.com/watch?v=AJQSGD_XPrk},
    bib2html_pubtype={Refereed Conference},
    abstract={As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.}
    }

  • Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova. Learning from Demonstration for Shaping through Inverse Reinforcement Learning. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016. 24.9% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.

    @inproceedings{2016AAMAS-Suay,
    author={Suay, Halit Bener and Brys, Tim and Taylor, Matthew E. and Chernova, Sonia},
    title={{Learning from Demonstration for Shaping through Inverse Reinforcement Learning}},
    booktitle={{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month={May},
    year={2016},
    note={24.9% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    abstract={Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.}
    }

  • Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer. In Proceedings of the 25th International Conference on Artificial Intelligence (IJCAI), July 2016. 25% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

    @inproceedings{2016IJCAI-Zhan,
    author={Yusen Zhan and Haitham Bou Ammar and Matthew E. Taylor},
    title={{Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer}},
    booktitle={{Proceedings of the 25th International Conference on Artificial Intelligence ({IJCAI})}},
    month={July},
    year={2016},
    note={25% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    abstract={Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.}
    }

2015

  • Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), January 2015. 27% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    The success of applying policy gradient reinforcement learning (RL) to difficult control tasks hinges crucially on the ability to determine a sensible initialization for the policy. Transfer learning methods tackle this problem by reusing knowledge gleaned from solving other related tasks. In the case of multiple task domains, these algorithms require an inter-task mapping to facilitate knowledge transfer across domains. However, there are currently no general methods to learn an inter-task mapping without requiring either background knowledge that is not typically present in RL settings, or an expensive analysis of an exponential number of inter-task mappings in the size of the state and action spaces. This paper introduces an autonomous framework that uses unsupervised manifold alignment to learn intertask mappings and effectively transfer samples between different task domains. Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL.

    @inproceedings{2015AAAI-BouAamar,
    author={Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew E. Taylor},
    title={{Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment}},
    booktitle={{Proceedings of the 29th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
    month={January},
    year={2015},
    note={27% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning, Transfer Learning},
    abstract={The success of applying policy gradient reinforcement learning (RL) to difficult control tasks hinges crucially on the ability to determine a sensible initialization for the policy. Transfer learning methods tackle this problem by reusing knowledge gleaned from solving other related tasks. In the case of multiple task domains, these algorithms require an inter-task mapping to facilitate knowledge transfer across domains. However, there are currently no general methods to learn an inter-task mapping without requiring either background knowledge that is not typically present in RL settings, or an expensive analysis of an exponential number of inter-task mappings in the size of the state and action spaces. This paper introduces an autonomous framework that uses unsupervised manifold alignment to learn intertask mappings and effectively transfer samples between different task domains. Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL.},
    }

  • Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. Policy Transfer using Reward Shaping. In The 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2015. 25% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.

    @inproceedings{2015AAMAS-Brys,
    author={Tim Brys and Anna Harutyunyan and Matthew E. Taylor and Ann Now\'{e}},
    title={{Policy Transfer using Reward Shaping}},
    booktitle={{The 14th International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month={May},
    year={2015},
    note={25% acceptance rate},
    bib2html_rescat={Reinforcement Learning, Transfer Learning},
    bib2html_pubtype={Refereed Conference},
    abstract={Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.},
    }

  • Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. Reinforcement Learning from Demonstration through Shaping. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015. 28.8% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.

    @inproceedings{2015IJCAI-Brys,
    author={Tim Brys and Anna Harutyunyan and Halit Bener Suay and Sonia Chernova and Matthew E. Taylor and Ann Now\'e},
    title={{Reinforcement Learning from Demonstration through Shaping}},
    booktitle={{Proceedings of the International Joint Conference on Artificial Intelligence ({IJCAI})}},
    year={2015},
    note={28.8% acceptance rate},
    bib2html_rescat={Reinforcement Learning},
    bib2html_pubtype={Refereed Conference},
    abstract={Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.}
    }

2014

  • Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. Online Multi-Task Learning for Policy Gradient Methods. In Proceedings of the 31st International Conferences on Machine Learning (ICML), June 2014. 25% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014ICML-BouAmmar,
    author={Haitham Bou Ammar and Eric Eaton and Paul Ruvolo and Matthew E. Taylor},
    title={{Online Multi-Task Learning for Policy Gradient Methods}},
    booktitle={{Proceedings of the 31st International Conferences on Machine Learning ({ICML})}},
    note={25% acceptance rate},
    month={June},
    year={2014},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Transfer Learning, Reinforcement Learning},
    }

  • Tim Brys, Anna Harutyunyan, Peter Vrancx, Matthew E. Taylor, Daniel Kudenko, and Ann Nowé. Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping. In Proceedings of the IEEE 2014 International Joint Conference on Neural Networks (IJCNN), July 2014. 59% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014IJCNN-Brys,
    author={Tim Brys and Anna Harutyunyan and Peter Vrancx and Matthew E. Taylor and Daniel Kudenko and Ann Now\'{e}},
    title={{Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping}},
    booktitle={{Proceedings of the {IEEE} 2014 International Joint Conference on Neural Networks ({IJCNN})}},
    month={July},
    year={2014},
    note={59% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    }

  • Tim Brys, Ann Nowé, Daniel Kudenko, and Matthew E. Taylor. Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014AAAI-Brys,
    author={Tim Brys and Ann Now\'{e} and Daniel Kudenko and Matthew E. Taylor},
    title={{Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence}},
    booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
    month={July},
    year={2014},
    note={28% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    }

  • Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. An Autonomous Transfer Learning Algorithm for TD-Learners. In Proceedings of the 8th Hellenic Conference on Artificial Intelligence (SETN), May 2014. 50% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014SETN-Fachantidis,
    author={Anestis Fachantidis and Ioannis Partalas and Matthew E. Taylor and Ioannis Vlahavas},
    title={{An Autonomous Transfer Learning Algorithm for TD-Learners}},
    booktitle={{Proceedings of the 8th Hellenic Conference on Artificial Intelligence ({SETN})}},
    note={50% acceptance rate},
    month={May},
    year={2014},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Transfer Learning, Reinforcement Learning},
    }

  • Chris HolmesParker, Matthew E. Taylor, Adrian Agogino, and Kagan Tumer. CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning. In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT), August 2014. 43% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014IAT-HolmesParker,
    author={Chris HolmesParker and Matthew E. Taylor and Adrian Agogino and Kagan Tumer},
    title={{{CLEAN}ing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning}},
    booktitle={{Proceedings of the 2014 {IEEE/WIC/ACM} International Conference on Intelligent Agent Technology ({IAT})}},
    month={August},
    year={2014},
    note={43% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    }

  • Robert Loftin, Bei Peng, James MacGlashan, Michael Littman, Matthew E. Taylor, David Roberts, and Jeff Huang. Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), August 2014.
    [BibTeX] [Download PDF]
    @inproceedings{2014ROMAN-Loftin,
    author={Robert Loftin and Bei Peng and James MacGlashan and Michael Littman and Matthew E. Taylor and David Roberts and Jeff Huang},
    title={{Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies}},
    booktitle={{Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication ({RO-MAN})}},
    month={August},
    year={2014},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    }

  • Robert Loftin, Bei Peng, James MacGlashan, Machiael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), July 2014. 28% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014AAAI-Loftin,
    author={Robert Loftin and Bei Peng and James MacGlashan and Machiael L. Littman and Matthew E. Taylor and Jeff Huang and David L. Roberts},
    title={{A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback}},
    booktitle={{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ({AAAI})}},
    month={July},
    year={2014},
    note={28% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    }

  • Matthew E. Taylor and Lisa Torrey. Agents Teaching Agents in Reinforcement Learning (Nectar Abstract). In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), September 2014. Nectar Track, 45% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{2014ECML-Taylor,
    author={Matthew E. Taylor and Lisa Torrey},
    title={{Agents Teaching Agents in Reinforcement Learning (Nectar Abstract)}},
    booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)}},
    month={September},
    year={2014},
    note={Nectar Track, 45% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    }

2013

  • Haitham Bou Ammar, Decebal Constantin Mocanu, Matthew E. Taylor, Kurt Driessens, Karl Tuyls, and Gerhard Weiss. Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), September 2013. 25% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{ECML13-BouAamar,
    author={Haitham Bou Ammar and Decebal Constantin Mocanu and Matthew E. Taylor and Kurt Driessens and Karl Tuyls and Gerhard Weiss},
    title={{Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines}},
    booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ({ECML PKDD})}},
    month={September},
    year = {2013},
    note = {25% acceptance rate},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Transfer Learning, Reinforcement Learning},
    }

  • Tong Pham, Aly Tawfika, and Matthew E. Taylor. A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment. In Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations, September 2013.
    [BibTeX] [Download PDF]
    @inproceedings{abm13-Pham,
    author="Tong Pham and Aly Tawfika and Matthew E. Taylor",
    title={{A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment}},
    booktitle={{Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations}},
    month="September",
    year = {2013},
    bib2html_rescat = {DCOP},
    bib2html_pubtype = {Refereed Conference},
    }

  • Lisa Torrey and Matthew E. Taylor. Teaching on a Budget: Agents Advising Agents in Reinforcement Learning. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013. 23% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    This paper introduces a teacher-student framework for reinforcement learning. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two experimental domains: Mountain Car and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

    @inproceedings{AAMAS13-Torrey,
    author="Lisa Torrey and Matthew E. Taylor",
    title={{Teaching on a Budget: Agents Advising Agents in Reinforcement Learning}},
    booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="May",
    year = {2013},
    note = {23% acceptance rate},
    wwwnote = {<a href="aamas2013.cs.umn.edu/">AAMAS-13</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Transfer Learning, Reinforcement Learning},
    abstract = "This paper introduces a teacher-student framework for reinforcement
    learning. In this framework, a teacher agent instructs a student
    agent by suggesting actions the student should take as it learns.
    However, the teacher may only give such advice a limited number
    of times. We present several novel algorithms that teachers can
    use to budget their advice effectively, and we evaluate them in two
    experimental domains: Mountain Car and Pac-Man. Our results
    show that the same amount of advice, given at different moments,
    can have different effects on student learning, and that teachers can
    significantly affect student learning even when students use different
    learning methods and state representations.",
    }

2012

  • Haitham Bou Ammar, Karl Tuyls, Matthew E. Taylor, Kurt Driessen, and Gerhard Weiss. Reinforcement Learning Transfer via Sparse Coding. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), June 2012. 20% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Although reinforcement learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (TrLSPI and TrFQI) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of environmental samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.

    @inproceedings{12AAMAS-Haitham,
    author="Haitham Bou Ammar and Karl Tuyls and Matthew E. Taylor and Kurt Driessen and Gerhard Weiss",
    title={{Reinforcement Learning Transfer via Sparse Coding}},
    booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="June",
    year = {2012},
    note = {20% acceptance rate},
    wwwnote = {<a href="http://aamas2012.webs.upv.es">AAMAS-12</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Transfer Learning, Reinforcement Learning},
    abstract = "Although reinforcement learning (RL) has been successfully deployed
    in a variety of tasks, learning speed remains a fundamental
    problem for applying RL in complex environments. Transfer learning
    aims to ameliorate this shortcoming by speeding up learning
    through the adaptation of previously learned behaviors in similar
    tasks. Transfer techniques often use an inter-task mapping, which
    determines how a pair of tasks are related. Instead of relying on a
    hand-coded inter-task mapping, this paper proposes a novel transfer
    learning method capable of autonomously creating an inter-task
    mapping by using a novel combination of sparse coding, sparse
    projection learning and sparse Gaussian processes. We also propose
    two new transfer algorithms (TrLSPI and TrFQI) based on
    least squares policy iteration and fitted-Q-iteration. Experiments
    not only show successful transfer of information between similar
    tasks, inverted pendulum to cart pole, but also between two very
    different domains: mountain car to cart pole. This paper empirically
    shows that the learned inter-task mapping can be successfully
    used to (1) improve the performance of a learned policy on a fixed
    number of environmental samples, (2) reduce the learning times
    needed by the algorithms to converge to a policy on a fixed number
    of samples, and (3) converge faster to a near-optimal policy given
    a large number of samples.",
    }

2011

  • Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. Integrating Reinforcement Learning with Human Demonstrations of Varying Ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{11AAMAS-HAT-Taylor,
    author="Matthew E. Taylor and Halit Bener Suay and Sonia Chernova",
    title = {{Integrating Reinforcement Learning with Human Demonstrations of Varying Ability}},
    booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="May",
    year = {2011},
    note = {22% acceptance rate},
    wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Transfer Learning, Reinforcement Learning},
    }

  • Matthew E. Taylor, Brian Kulis, and Fei Sha. Metric Learning for Reinforcement Learning Agents. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{11AAMAS-MetricLearn-Taylor,
    author="Matthew E. Taylor and Brian Kulis and Fei Sha",
    title = {{Metric Learning for Reinforcement Learning Agents}},
    booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="May",
    year = {2011},
    note = {22% acceptance rate},
    wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning},
    }

  • Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein, Gal Kaminka, Stacy Marsella, Andrew Ogden, Inbal Rika, Ankur Sheel, Matthew E. Taylor, Xuezhi Wang, Avishay Zilka, and Milind Tambe. ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2011. 22% acceptance rate
    [BibTeX] [Download PDF]
    @inproceedings{11AAMAS-Tsai,
    author={Jason Tsai and Natalie Fridman and Emma Bowring and Matthew Brown and Shira Epstein and Gal Kaminka and Stacy Marsella and Andrew Ogden and Inbal Rika and Ankur Sheel and Matthew E. Taylor and {Xuezhi Wang} and Avishay Zilka and Milind Tambe},
    title = {{ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison}},
    booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="May",
    year = {2011},
    note = {22% acceptance rate},
    wwwnote = {<a href="http://aamas2011.tw">AAMAS-11</a>},
    bib2html_pubtype = {Refereed Conference},
    }

2010

  • Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, and Milind Tambe. When Should There be a “Me” in “Team”? Distributed Multi-Agent Optimization Under Uncertainty. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2010. 24% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimization problem where agents must cooperate to balance exploration and exploitation. Surprisingly, results show that increased teamwork can hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertainty penalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in our problem setting.

    @inproceedings{AAMAS10-Taylor,
    author = {Matthew E. Taylor and Manish Jain and Yanquin Jin and Makoto Yooko and Milind Tambe},
    title = {{When Should There be a ``Me'' in ``Team''? {D}istributed Multi-Agent Optimization Under Uncertainty}},
    booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="May",
    year = {2010},
    note = {24% acceptance rate},
    wwwnote = {<a href="http://www.cse.yorku.ca/AAMAS2010/index.php>AAMAS-10</a>},
    abstract={Increasing teamwork between agents typically increases the
    performance of a multi-agent system, at the cost of increased
    communication and higher computational complexity. This work examines
    joint actions in the context of a multi-agent optimization problem
    where agents must cooperate to balance exploration and
    exploitation. Surprisingly, results show that increased teamwork can
    hurt agent performance, even when communication and computation costs
    are ignored, which we term the team uncertainty penalty. This paper
    introduces the above phenomena, analyzes it, and presents algorithms
    to reduce the effect of the penalty in our problem setting.},
    wwwnote={Supplemental material is available at <a href="http://teamcore.usc.edu/dcop/">http://teamcore.usc.edu/dcop/</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {DCOP},
    }

  • Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Bertrand A. Maher, Doug Burger, and Kathryn S. McKinley. Evolving Compiler Heuristics to Manage Communication and Contention. In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI), July 2010. Nectar Track, 25% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    As computer architectures become increasingly complex, hand-tuning compiler heuristics becomes increasingly tedious and time consuming for compiler developers. This paper presents a case study that uses a genetic algorithm to learn a compiler policy. The target policy implicitly balances communication and contention among processing elements of the TRIPS processor, a physically realized prototype chip. We learn specialized policies for individual programs as well as general policies that work well across all programs. We also employ a two-stage method that first classifies the code being compiled based on salient characteristics, and then chooses a specialized policy based on that classification. <br> This work is particularly interesting for the AI community because it 1 emphasizes the need for increased collaboration between AI researchers and researchers from other branches of computer science and 2 discusses a machine learning setup where training on the custom hardware requires weeks of training, rather than the more typical minutes or hours.

    @inproceedings(AAAI10-Nectar-taylor,
    author="Matthew E. Taylor and Katherine E. Coons and Behnam Robatmili and Bertrand A. Maher and Doug Burger and Kathryn S. McKinley",
    title={{Evolving Compiler Heuristics to Manage Communication and Contention}},
    note = "Nectar Track, 25% acceptance rate",
    booktitle={{Proceedings of the Twenty-Fourth Conference on Artificial Intelligence ({AAAI})}},
    month="July",year="2010",
    abstract="
    As computer architectures become increasingly complex, hand-tuning
    compiler heuristics becomes increasingly tedious and time consuming
    for compiler developers. This paper presents a case study that uses a
    genetic algorithm to learn a compiler policy. The target policy
    implicitly balances communication and contention among processing
    elements of the TRIPS processor, a physically realized prototype chip.
    We learn specialized policies for individual programs as well as
    general policies that work well across all programs. We also employ a
    two-stage method that first classifies the code being compiled based
    on salient characteristics, and then chooses a specialized policy
    based on that classification.
    <br>
    This work is particularly interesting for the AI community because it
    1 emphasizes the need for increased collaboration between AI
    researchers and researchers from other branches of computer science
    and 2 discusses a machine learning setup where training on the custom
    hardware requires weeks of training, rather than the more typical
    minutes or hours.",
    wwwnote={<a href="http://www.aaai.org/Conferences/AAAI/aaai10.php">AAAI-2010</a>. This paper is based on results presented in our earlier <a href="b2hd-PACT08-coons.html">PACT-08 paper</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Genetic Algorithms},
    bib2html_funding = {NSF, DARPA}
    )

2009

  • Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI), July 2009. 26% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

    @inproceedings(IJCAI09-Jain,
    author="Manish Jain and Matthew E. Taylor and Makoto Yokoo and Milind Tambe",
    title={{{DCOP}s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
    booktitle={{Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence ({IJCAI})}},
    month="July",
    year= "2009",
    note = {26% acceptance rate},
    wwwnote={<a href="http://www.ijcai-09.org">IJCAI-2009</a>},
    abstract={Buoyed by recent successes in the area of distributed
    constraint optimization problems (DCOPs), this paper addresses
    challenges faced when applying DCOPs to real-world domains. Three
    fundamental challenges must be addressed for a class of real-world
    domains, requiring novel DCOP algorithms. First, agents may not
    know the payoff matrix and must explore the environment to
    determine rewards associated with variable settings. Second,
    agents may need to maximize total accumulated reward rather than
    instantaneous final reward. Third, limited time horizons disallow
    exhaustive exploration of the environment. We propose and
    implement a set of novel algorithms that combine
    decision-theoretic exploration approaches with DCOP-mandated
    coordination. In addition to simulation results, we implement
    these algorithms on robots, deploying DCOPs on a distributed
    mobile sensor network.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {DCOP, Robotics},
    bib2html_funding = {DARPA}
    )

  • Pradeep Varakantham, Jun-young Kwak, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping. In Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling (ICAPS), September 2009. 34% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

    @inproceedings(ICAPS09-Varakantham,
    author="Pradeep Varakantham and Jun-young Kwak and Matthew E. Taylor and Janusz Marecki and Paul Scerri and Milind Tambe",
    title={{Exploiting Coordination Locales in Distributed {POMDP}s via Social Model Shaping}},
    booktitle={{Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling ({ICAPS})}},
    month="September",
    year= "2009",
    note = {34% acceptance rate},
    wwwnote={<a href="http://icaps09.uom.gr">ICAPS-2009</a>},
    abstract={ Distributed POMDPs provide an expressive framework for
    modeling multiagent collaboration problems, but NEXP-Complete
    complexity hinders their scalability and application in real-world
    domains. This paper introduces a subclass of distributed POMDPs,
    and TREMOR, an algorithm to solve such distributed POMDPs. The
    primary novelty of TREMOR is that agents plan individually with a
    single agent POMDP solver and use social model shaping to
    implicitly coordinate with other agents. Experiments demonstrate
    that TREMOR can provide solutions orders of magnitude faster than
    existing algorithms while achieving comparable, or even superior,
    solution quality.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Distributed POMDPs},
    bib2html_funding = {ARMY}
    )

2008

  • Katherine K. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Kathryn McKinley, and Doug Burger. Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning. In Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques (PACT), pages 32-42, October 2008. 19% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) processors, in which instructions communicate with one another directly on a distributed substrate, give the compiler control over communication overheads at a fine granularity. Prior work shows that compilers can effectively reduce fine-grained communication overheads in EDGE architectures using a spatial instruction placement algorithm with a heuristic-based cost function. While this algorithm is effective, the cost function must be painstakingly tuned. Heuristics tuned to perform well across a variety of applications leave users with little ability to tune performance-critical applications, yet we find that the best placement heuristics vary significantly with the application. <p> First, we suggest a systematic feature selection method that reduces the feature set size based on the extent to which features affect performance. To automatically discover placement heuristics, we then use these features as input to a reinforcement learning technique, called Neuro-Evolution of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve neural networks. We show that NEAT outperforms simulated annealing, the most commonly used optimization technique for instruction placement. We use NEAT to learn general heuristics that are as effective as hand-tuned heuristics, but we find that improving over highly hand-tuned general heuristics is difficult. We then suggest a hierarchical approach to machine learning that classifies segments of code with similar characteristics and learns heuristics for these classes. This approach performs closer to the specialized heuristics. Together, these results suggest that learning compiler heuristics may benefit from both improved feature selection and classification.

    @inproceedings{PACT08-coons,
    author="Katherine K. Coons and Behnam Robatmili and Matthew E. Taylor and Bertrand A. Maher and Kathryn McKinley and Doug Burger",
    title={{Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning}},
    booktitle={{Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques ({PACT})}},
    month="October",
    year="2008",
    pages="32--42",
    note = {19% acceptance rate},
    wwwnote={<a href="http://www.eecg.toronto.edu/pact/">PACT-2008</a>},
    abstract = {Communication overheads are one of the fundamental challenges in a
    multiprocessor system. As the number of processors on a chip increases,
    communication overheads and the distribution of computation and data
    become increasingly important performance factors. Explicit Dataflow
    Graph Execution (EDGE) processors, in which instructions communicate
    with one another directly on a distributed substrate, give the compiler
    control over communication overheads at a fine granularity. Prior work
    shows that compilers can effectively reduce fine-grained communication
    overheads in EDGE architectures using a spatial instruction placement
    algorithm with a heuristic-based cost function. While this algorithm is
    effective, the cost function must be painstakingly tuned. Heuristics tuned
    to perform well across a variety of applications leave users with little
    ability to tune performance-critical applications, yet we find that the
    best placement heuristics vary significantly with the application.
    <p>
    First, we suggest a systematic feature selection method that reduces the
    feature set size based on the extent to which features affect performance.
    To automatically discover placement heuristics, we then use these features
    as input to a reinforcement learning technique, called Neuro-Evolution
    of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve
    neural networks. We show that NEAT outperforms simulated annealing, the
    most commonly used optimization technique for instruction placement. We
    use NEAT to learn general heuristics that are as effective as hand-tuned
    heuristics, but we find that improving over highly hand-tuned general
    heuristics is difficult. We then suggest a hierarchical approach
    to machine learning that classifies segments of code with similar
    characteristics and learns heuristics for these classes. This approach
    performs closer to the specialized heuristics. Together, these results
    suggest that learning compiler heuristics may benefit from both improved
    feature selection and classification.
    },
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Autonomic Computing, Machine Learning in Practice},
    }

  • Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Transfer Learning and Intelligence: an Argument and Approach. In Proceedings of the First Conference on Artificial General Intelligence (AGI), March 2008. 50% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    In order to claim fully general intelligence in an autonomous agent, the ability to learn is one of the most central capabilities. Classical machine learning techniques have had many significant empirical successes, but large real-world problems that are of interest to generally intelligent agents require learning much faster (with much less training experience) than is currently possible. This paper presents transfer learning, where knowledge from a learned task can be used to significantly speed up learning in a novel task, as the key to achieving the learning capabilities necessary for general intelligence. In addition to motivating the need for transfer learning in an intelligent agent, we introduce a novel method for selecting types of tasks to be used for transfer and empirically demonstrate that such a selection can lead to significant increases in training speed in a two-player game.

    @inproceedings(AGI08-taylor,
    author="Matthew E. Taylor and Gregory Kuhlmann and Peter Stone",
    title={{Transfer Learning and Intelligence: an Argument and Approach}},
    booktitle={{Proceedings of the First Conference on Artificial General Intelligence ({AGI})}},
    month="March",
    year="2008",
    abstract="In order to claim fully general intelligence in an
    autonomous agent, the ability to learn is one of the most
    central capabilities. Classical machine learning techniques
    have had many significant empirical successes, but large
    real-world problems that are of interest to generally
    intelligent agents require learning much faster (with much
    less training experience) than is currently possible. This
    paper presents transfer learning, where knowledge
    from a learned task can be used to significantly speed up
    learning in a novel task, as the key to achieving the
    learning capabilities necessary for general intelligence. In
    addition to motivating the need for transfer learning in an
    intelligent agent, we introduce a novel method for selecting
    types of tasks to be used for transfer and empirically
    demonstrate that such a selection can lead to significant
    increases in training speed in a two-player game.",
    note = {50% acceptance rate},
    wwwnote={<a href="http://agi-08.org/">AGI-2008</a><br> A video
    of talk is available <a
    href="http://video.google.com/videoplay?docid=1984013763155542745&hl=en">here</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Transfer Learning},
    bib2html_funding = {NSF, DARPA},
    )

  • Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. Autonomous Transfer for Reinforcement Learning. In Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 283-290, May 2008. 22% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Recent work in transfer learning has succeeded in making reinforcement learning algorithms more efficient by incorporating knowledge from previous tasks. However, such methods typically must be provided either a full model of the tasks or an explicit relation mapping one task into the other. An autonomous agent may not have access to such high-level information, but would be able to analyze its experience to find similarities between tasks. In this paper we introduce Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent’s experience. We empirically demonstrate that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks. Additionally, we demonstrate that our method may also assist with the difficult problem of task selection for transfer.

    @inproceedings{AAMAS08-taylor,
    author="Matthew E. Taylor and Gregory Kuhlmann and Peter Stone",
    title={{Autonomous Transfer for Reinforcement Learning}},
    booktitle={{Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="May",
    year="2008",
    pages="283--290",
    abstract={Recent work in transfer learning has succeeded in
    making reinforcement learning algorithms more
    efficient by incorporating knowledge from previous
    tasks. However, such methods typically must be
    provided either a full model of the tasks or an
    explicit relation mapping one task into the
    other. An autonomous agent may not have access to
    such high-level information, but would be able to
    analyze its experience to find similarities between
    tasks. In this paper we introduce Modeling
    Approximate State Transitions by Exploiting
    Regression (MASTER), a method for automatically
    learning a mapping from one task to another through
    an agent's experience. We empirically demonstrate
    that such learned relationships can significantly
    improve the speed of a reinforcement learning
    algorithm in a series of Mountain Car
    tasks. Additionally, we demonstrate that our method
    may also assist with the difficult problem of task
    selection for transfer.},
    note = {22% acceptance rate},
    wwwnote={<a href="http://gaips.inesc-id.pt/aamas2008/">AAMAS-2008</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Transfer Learning},
    bib2html_funding = {DARPA, NSF}
    }

  • Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. Transferring Instances for Model-Based Reinforcement Learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), pages 488-505, September 2008. 19% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Recent work in transfer learning has succeeded in Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. Transfer learning methods have made progress reducing sample complexity, but they have primarily been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample efficiency and asymptotic performance of a model-based algorithm when learning in a continuous state space. Additionally, we conduct experiments to test the limits of TIMBREL’s effectiveness.

    @inproceedings(ECML08-taylor,
    author="Matthew E. Taylor and Nicholas K. Jong and Peter Stone",
    title={{Transferring Instances for Model-Based Reinforcement Learning}},
    booktitle={{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ({ECML PKDD})}},
    pages="488--505",
    month="September",
    year= "2008",
    note = {19% acceptance rate},
    wwwnote={<a href="http://www.ecmlpkdd2008.org/">ECML-2008</a>},
    abstract={Recent work in transfer learning has succeeded in
    Reinforcement learning agents typically require a significant
    amount of data before performing well on complex tasks. Transfer
    learning methods have made progress reducing sample complexity,
    but they have primarily been applied to model-free learning
    methods, not more data-efficient model-based learning
    methods. This paper introduces TIMBREL, a novel method capable of
    transferring information effectively into a model-based
    reinforcement learning algorithm. We demonstrate that TIMBREL can
    significantly improve the sample efficiency and asymptotic
    performance of a model-based algorithm when learning in a
    continuous state space. Additionally, we conduct experiments to
    test the limits of TIMBREL's effectiveness.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Transfer Learning, Reinforcement Learning, Planning},
    bib2html_funding = {NSF, DARPA}
    )

2007

  • Mazda Ahmadi, Matthew E. Taylor, and Peter Stone. IFSA: Incremental Feature-Set Augmentation for Reinforcement Learning Tasks. In Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1120-1127, May 2007. 22% acceptance rate, Finalist for Best Student Paper
    [BibTeX] [Abstract] [Download PDF]

    Reinforcement learning is a popular and successful framework for many agent-related problems because only limited environmental feedback is necessary for learning. While many algorithms exist to learn effective policies in such problems, learning is often used to solve real world problems, which typically have large state spaces, and therefore suffer from the “curse of dimensionality.” One effective method for speeding-up reinforcement learning algorithms is to leverage expert knowledge. In this paper, we propose a method for dynamically augmenting the agent’s feature set in order to speed up value-function-based reinforcement learning. The domain expert divides the feature set into a series of subsets such that a novel problem concept can be learned from each successive subset. Domain knowledge is also used to order the feature subsets in order of their importance for learning. Our algorithm uses the ordered feature subsets to learn tasks significantly faster than if the entire feature set is used from the start. Incremental Feature-Set Augmentation (IFSA) is fully implemented and tested in three different domains: Gridworld, Blackjack and RoboCup Soccer Keepaway. All experiments show that IFSA can significantly speed up learning and motivates the applicability of this novel RL method.

    @inproceedings{AAMAS07-ahmadi,
    author="Mazda Ahmadi and Matthew E. Taylor and Peter Stone",
    title={{{IFSA}: Incremental Feature-Set Augmentation for Reinforcement Learning Tasks}},
    booktitle={{Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    pages="1120--1127",
    month="May",
    year="2007",
    abstract={
    Reinforcement learning is a popular and successful framework for
    many agent-related problems because only limited environmental
    feedback is necessary for learning. While many algorithms exist to
    learn effective policies in such problems, learning is often
    used to solve real world problems, which typically have large state
    spaces, and therefore suffer from the ``curse of dimensionality.''
    One effective method for speeding-up reinforcement learning algorithms
    is to leverage expert knowledge. In this paper, we propose a method
    for dynamically augmenting the agent's feature set in order to
    speed up value-function-based reinforcement learning. The domain
    expert divides the feature set into a series of subsets such that a
    novel problem concept can be learned from each successive
    subset. Domain knowledge is also used to order the feature subsets in
    order of their importance for learning. Our algorithm uses the
    ordered feature subsets to learn tasks significantly faster than if
    the entire feature set is used from the start. Incremental
    Feature-Set Augmentation (IFSA) is fully implemented and tested in
    three different domains: Gridworld, Blackjack and RoboCup Soccer
    Keepaway. All experiments show that IFSA can significantly speed up
    learning and motivates the applicability of this novel RL method.},
    note = {22% acceptance rate, Finalist for Best Student Paper},
    wwwnote={<span align="left" style="color: red; font-weight: bold">Best Student Paper Nomination</span> at <a href="http://www.aamas2007.nl/">AAMAS-2007</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning},
    bib2html_funding = {DARPA, NSF,ONR},
    }

  • Matthew E. Taylor, Cynthia Matuszek, Pace Reagan Smith, and Michael Witbrock. Guiding Inference with Policy Search Reinforcement Learning. In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS), May 2007. 52% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Symbolic reasoning is a well understood and effective approach to handling reasoning over formally represented knowledge; however, simple symbolic inference systems necessarily slow as complexity and ground facts grow. As automated approaches to ontology-building become more prevalent and sophisticated, knowledge base systems become larger and more complex, necessitating techniques for faster inference. This work uses reinforcement learning, a statistical machine learning technique, to learn control laws which guide inference. We implement our learning method in ResearchCyc, a very large knowledge base with millions of assertions. A large set of test queries, some of which require tens of thousands of inference steps to answer, can be answered faster after training over an independent set of training queries. Furthermore, this learned inference module outperforms ResearchCyc’s integrated inference module, a module that has been hand-tuned with considerable effort.

    @inproceedings{FLAIRS07-taylor-inference,
    author="Matthew E. Taylor and Cynthia Matuszek and Pace Reagan Smith and Michael Witbrock",
    title={{Guiding Inference with Policy Search Reinforcement Learning}},
    booktitle={{Proceedings of the Twentieth International FLAIRS Conference {(FLAIRS})}},
    month="May",
    year="2007",
    abstract="Symbolic reasoning is a well understood and
    effective approach to handling reasoning over
    formally represented knowledge; however, simple
    symbolic inference systems necessarily slow as
    complexity and ground facts grow. As automated
    approaches to ontology-building become more
    prevalent and sophisticated, knowledge base systems
    become larger and more complex, necessitating
    techniques for faster inference. This work uses
    reinforcement learning, a statistical machine
    learning technique, to learn control laws which
    guide inference. We implement our learning method in
    ResearchCyc, a very large knowledge base with
    millions of assertions. A large set of test queries,
    some of which require tens of thousands of inference
    steps to answer, can be answered faster after
    training over an independent set of training
    queries. Furthermore, this learned inference module
    outperforms ResearchCyc's integrated inference
    module, a module that has been hand-tuned with
    considerable effort.",
    note = {52% acceptance rate},
    wwwnote={<a href="http://www.cise.ufl.edu/~ddd/FLAIRS/flairs2007/">FLAIRS-2007</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Inference, Machine Learning in Practice},
    bib2html_funding = {DARPA},
    }

  • Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock. Autonomous Classification of Knowledge into an Ontology. In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS), May 2007. 52% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Ontologies are an increasingly important tool in knowledge representation, as they allow large amounts of data to be related in a logical fashion. Current research is concentrated on automatically constructing ontologies, merging ontologies with different structures, and optimal mechanisms for ontology building; in this work we consider the related, but distinct, problem of how to automatically determine where to place new knowledge into an existing ontology. Rather than relying on human knowledge engineers to carefully classify knowledge, it is becoming increasingly important for machine learning techniques to automate such a task. Automation is particularly important as the rate of ontology building via automatic knowledge acquisition techniques increases. This paper compares three well-established machine learning techniques and shows that they can be applied successfully to this knowledge placement task. Our methods are fully implemented and tested in the Cyc knowledge base system.

    @inproceedings{FLAIRS07-taylor-ontology,
    author="Matthew E. Taylor and Cynthia Matuszek and Bryan Klimt and Michael Witbrock",
    title={{Autonomous Classification of Knowledge into an Ontology}},
    booktitle={{Proceedings of the Twentieth International FLAIRS Conference ({FLAIRS})}},
    month="May",
    year="2007",
    abstract="Ontologies are an increasingly important tool in
    knowledge representation, as they allow large amounts of data
    to be related in a logical fashion. Current research is
    concentrated on automatically constructing ontologies, merging
    ontologies with different structures, and optimal mechanisms
    for ontology building; in this work we consider the related,
    but distinct, problem of how to automatically determine where
    to place new knowledge into an existing ontology. Rather than
    relying on human knowledge engineers to carefully classify
    knowledge, it is becoming increasingly important for machine
    learning techniques to automate such a task. Automation is
    particularly important as the rate of ontology building via
    automatic knowledge acquisition techniques increases. This
    paper compares three well-established machine learning
    techniques and shows that they can be applied successfully to
    this knowledge placement task. Our methods are fully
    implemented and tested in the Cyc knowledge base system.",
    note = {52% acceptance rate},
    wwwnote={<a href="http://www.cise.ufl.edu/~ddd/FLAIRS/flairs2007/">FLAIRS-2007</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Ontologies, Machine Learning in Practice},
    bib2html_funding = {DARPA},
    }

  • Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning. In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 156-163, May 2007. 22% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods ({\sc tvitm-ps}) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that {\sc tvitm-ps} can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that {\sc tvitm-ps} still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

    @inproceedings{AAMAS07-taylor,
    author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
    title={{Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning}},
    booktitle={{Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    pages="156--163",
    month="May",
    year="2007",
    abstract={ The ambitious goal of transfer learning is to
    accelerate learning on a target task after training on
    a different, but related, source task. While many past
    transfer methods have focused on transferring
    value-functions, this paper presents a method for
    transferring policies across tasks with different
    state and action spaces. In particular, this paper
    utilizes transfer via inter-task mappings for policy
    search methods ({\sc tvitm-ps}) to construct a
    transfer functional that translates a population of
    neural network policies trained via policy search from
    a source task to a target task. Empirical results in
    robot soccer Keepaway and Server Job Scheduling show
    that {\sc tvitm-ps} can markedly reduce learning time
    when full inter-task mappings are available. The
    results also demonstrate that {\sc tvitm-ps} still
    succeeds when given only incomplete inter-task
    mappings. Furthermore, we present a novel method for
    learning such mappings when they are not
    available, and give results showing they perform
    comparably to hand-coded mappings. },
    note = {22% acceptance rate},
    wwwnote={<a href="http://www.aamas2007.nl/">AAMAS-2007</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Transfer Learning},
    bib2html_funding = {DARPA, NSF}
    }

  • Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison. In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI), pages 1675-1678, July 2007. Nectar Track, 38% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weaknesses are available. This paper summarizes a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. The results from this study help isolate the factors critical to the performance of each learning method and yield insights into their general strengths and weaknesses.

    @inproceedings(AAAI07-taylor,
    author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
    title={{Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison}},
    pages="1675--1678",
    booktitle={{Proceedings of the Twenty-Second Conference on Artificial Intelligence ({AAAI})}},
    month="July",
    year="2007",
    abstract="Reinforcement learning (RL) methods have become
    popular in recent years because of their ability to solve
    complex tasks with minimal feedback. Both genetic algorithms
    (GAs) and temporal difference (TD) methods have proven
    effective at solving difficult RL problems, but few rigorous
    comparisons have been conducted. Thus, no general guidelines
    describing the methods' relative strengths and weaknesses are
    available. This paper summarizes a detailed empirical
    comparison between a GA and a TD method in Keepaway, a
    standard RL benchmark domain based on robot soccer. The
    results from this study help isolate the factors critical to
    the performance of each learning method and yield insights
    into their general strengths and weaknesses.",
    note = {Nectar Track, 38% acceptance rate},
    wwwnote={<a href="http://www.aaai.org/Conferences/National/2007/aaai07.html">AAAI-2007</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Genetic Algorithms},
    bib2html_funding = {NSF, DARPA}
    )

  • Matthew E. Taylor and Peter Stone. Cross-Domain Transfer for Reinforcement Learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning (ICML), June 2007. 29% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    A typical goal for transfer learning algorithms is to utilize knowledge gained in a source task to learn a target task faster. Recently introduced transfer methods in reinforcement learning settings have shown considerable promise, but they typically transfer between pairs of very similar tasks. This work introduces Rule Transfer, a transfer algorithm that first learns rules to summarize a source task policy and then leverages those rules to learn faster in a target task. This paper demonstrates that Rule Transfer can effectively speed up learning in Keepaway, a benchmark RL problem in the robot soccer domain, based on experience from source tasks in the gridworld domain. We empirically show, through the use of three distinct transfer metrics, that Rule Transfer is effective across these domains.

    @inproceedings(ICML07-taylor,
    author="Matthew E. Taylor and Peter Stone",
    title={{Cross-Domain Transfer for Reinforcement Learning}},
    booktitle={{Proceedings of the Twenty-Fourth International Conference on Machine Learning ({ICML})}},
    month="June",
    year="2007",
    abstract="A typical goal for transfer learning algorithms is
    to utilize knowledge gained in a source task to learn a
    target task faster. Recently introduced transfer methods in
    reinforcement learning settings have shown considerable
    promise, but they typically transfer between pairs of very
    similar tasks. This work introduces Rule Transfer, a
    transfer algorithm that first learns rules to summarize a
    source task policy and then leverages those rules to learn
    faster in a target task. This paper demonstrates that Rule
    Transfer can effectively speed up learning in Keepaway, a
    benchmark RL problem in the robot soccer domain, based on
    experience from source tasks in the gridworld domain. We
    empirically show, through the use of three distinct transfer
    metrics, that Rule Transfer is effective across these
    domains.",
    note = {29% acceptance rate},
    wwwnote={<a href="http://oregonstate.edu/conferences/icml2007">ICML-2007</a>},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Transfer Learning},
    bib2html_funding = {NSF, DARPA} ,
    )

2006

  • Matthew E. Taylor, Shimon Whiteson, and Peter Stone. Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 1321-28, July 2006. 46% acceptance rate, Best Paper Award in GA track (of 85 submissions)
    [BibTeX] [Abstract] [Download PDF]

    Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods’ relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT~\cite{stanley:ec02evolving}, a GA that evolves neural networks, with Sarsa~\cite{Rummery94,Singh96}, a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

    @inproceedings{GECCO06-taylor,
    author="Matthew E. Taylor and Shimon Whiteson and Peter Stone",
    title={{Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning}},
    booktitle={{Proceedings of the Genetic and Evolutionary Computation Conference ({GECCO})}},
    month="July",
    year="2006",
    pages="1321--28",
    abstract={
    Both genetic algorithms (GAs) and temporal
    difference (TD) methods have proven effective at
    solving reinforcement learning (RL) problems.
    However, since few rigorous empirical comparisons
    have been conducted, there are no general guidelines
    describing the methods' relative strengths and
    weaknesses. This paper presents the results of a
    detailed empirical comparison between a GA and a TD
    method in Keepaway, a standard RL benchmark domain
    based on robot soccer. In particular, we compare
    the performance of NEAT~\cite{stanley:ec02evolving},
    a GA that evolves neural networks, with
    Sarsa~\cite{Rummery94,Singh96}, a popular TD method.
    The results demonstrate that NEAT can learn better
    policies in this task, though it requires more
    evaluations to do so. Additional experiments in two
    variations of Keepaway demonstrate that Sarsa learns
    better policies when the task is fully observable
    and NEAT learns faster when the task is
    deterministic. Together, these results help isolate
    the factors critical to the performance of each
    method and yield insights into their general
    strengths and weaknesses.
    },
    note = {46% acceptance rate, Best Paper Award in GA track (of 85 submissions)},
    wwwnote={<span align="left" style="color: red; font-weight: bold">Best Paper Award</span> (Genetic Algorithms Track) at <a href="http://www.sigevo.org/gecco-2006/">GECCO-2006</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Genetic Algorithms, Machine Learning in Practice},
    bib2html_funding = {NSF, DARPA}
    }

2005

  • Matthew E. Taylor and Peter Stone. Behavior Transfer for Value-Function-Based Reinforcement Learning. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 53-59, July 2005. 25% acceptance rate.
    [BibTeX] [Abstract] [Download PDF]

    Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found very slow in practice. A key feature of TD methods is that they represent policies in terms of value functions. In this paper we introduce \emph{behavior transfer}, a novel approach to speeding up TD learning by transferring the learned value function from one task to a second related task. We present experimental results showing that autonomous learners are able to learn one multiagent task and then use behavior transfer to markedly reduce the total training time for a more complex task.

    @inproceedings{AAMAS05-taylor,
    author="Matthew E. Taylor and Peter Stone",
    title={{Behavior Transfer for Value-Function-Based Reinforcement Learning}},
    booktitle={{Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems ({AAMAS})}},
    month="July",
    year="2005",
    pages="53--59",
    abstract={
    Temporal difference (TD) learning
    methods have become popular
    reinforcement learning techniques in recent years. TD
    methods have had some experimental successes and have
    been shown to exhibit some desirable properties in
    theory, but have often been found very slow in
    practice. A key feature of TD methods is that they
    represent policies in terms of value functions. In
    this paper we introduce \emph{behavior transfer}, a
    novel approach to speeding up TD learning by
    transferring the learned value function from one task
    to a second related task. We present experimental
    results showing that autonomous learners are able to
    learn one multiagent task and then use behavior
    transfer to markedly reduce the total training time
    for a more complex task.
    },
    note = {25% acceptance rate.},
    wwwnote={<a href="http://www.aamas2005.nl/">AAMAS-2005</a>.<br> Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Transfer Learning},
    bib2html_funding = {DARPA, NSF},
    }

  • Matthew E. Taylor, Peter Stone, and Yaxin Liu. Value Functions for RL-Based Behavior Transfer: A Comparative Study. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI), July 2005. 18% acceptance rate.
    [BibTeX] [Abstract] [Download PDF]

    Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generalizing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.

    @inproceedings(AAAI05-taylor,
    author="Matthew E. Taylor and Peter Stone and Yaxin Liu",
    title={{Value Functions for {RL}-Based Behavior Transfer: A Comparative Study}},
    booktitle={{Proceedings of the Twentieth National Conference on Artificial Intelligence ({AAAI})}},
    month="July",
    year="2005",
    abstract={
    Temporal difference (TD) learning methods have
    become popular reinforcement learning techniques in
    recent years. TD methods, relying on function
    approximators to generalize learning to novel
    situations, have had some experimental successes and
    have been shown to exhibit some desirable properties
    in theory, but have often been found slow in
    practice. This paper presents methods for further
    generalizing across tasks, thereby speeding
    up learning, via a novel form of behavior
    transfer. We compare learning on a complex task
    with three function approximators, a CMAC, a neural
    network, and an RBF, and demonstrate that behavior
    transfer works well with all three. Using behavior
    transfer, agents are able to learn one task and then
    markedly reduce the time it takes to learn a more
    complex task. Our algorithms are fully implemented
    and tested in the RoboCup-soccer keepaway domain.
    },
    note = {18% acceptance rate.},
    wwwnote={<a href="http://www.aaai.org/Conferences/National/2005/aaai05.html">AAAI-2005</a>. <br> Superseded by the journal article <a href="http://cs.lafayette.edu/~taylorm/Publications/b2hd-JMLR07-taylor.html">Transfer Learning via Inter-Task Mappings for Temporal Difference Learning</a>.},
    bib2html_pubtype = {Refereed Conference},
    bib2html_rescat = {Reinforcement Learning, Transfer Learning},
    bib2html_funding = {NSF, DARPA}
    )