Intelligent Robot Learning Laboratory (IRL Lab) Yusen Zhan

CONTACT INFORMATION:
Yusen Zhan
PhD in Computer Science, Fall 2016
Email: yusen.zhan@wsu.edu
Links: Personal Website


Thesis: Policy Advice, Non-convex and Distributed Optimization in Reinforcement Learning

ABSTRACT: Transfer learning is a method in machine learning that tries to use previous training knowledge to speed up the learning process. Policy advice is a type of transfer learning method where a student agent is able to learn faster via advice from a teacher agent. Here, the agent who provides advice (actions) is called the teacher agent. The agent who receives advice (actions) is the student agent. However, both this and other current reinforcement learning transfer methods have little theoretical analysis. This dissertation formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and the teacher’s advice. Regret bounds are provided and negative transfer is formally defined and studied.

On the other hand, policy search is a class of reinforcement learning algorithms for finding optimal policies to control problems with limited feedback. These methods have shown successful applications in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others proposed projection based methods for safe policies. These methods, however, can only handle convex policy constraints. In this dissertation, we contribute the first safe policy search reinforcement learner capable of operating under emph{non-convex policy constraints}. This is achieved by observing a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above and prove convergence in the non-convex stochastic setting.

Lastly, lifelong reinforcement learning is a framework similar to transfer learning that allows agents to learn multiple consecutive tasks sequentially online. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this dissertation, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange.

Projects

By: Yusen Zhan and Matthew E. Taylor

We developed a advice model framework to provide theoretical and practical analysis for agents to teach humans and agents in sequential reinforcement learning tasks. The teacher  agents assist the students (humans or agents) with action advice when the teachers observe the students reach some critical states. Assuming the teachers are optimal, the students will follow the action advice to achieve better performance. [1, 2]

[1] [pdf] Yusen Zhan and Matthew E. Taylor. Online Transfer Learning in Reinforcement Learning Domains. In Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), November 2015.
[Bibtex]
@inproceedings{2015SDMIA-Zhan,
author={Yusen Zhan and Matthew E. Taylor},
title={{Online Transfer Learning in Reinforcement Learning Domains}},
booktitle={{Proceedings of the {AAAI} Fall Symposium on Sequential Decision Making for Intelligent Agents ({SDMIA})}},
month={November},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.}
}
[2] [pdf] Yusen Zhan, Anestis Fachantidis, Ioannis Vlahavas, and Matthew E. Taylor. Agents Teaching Humans in Reinforcement Learning Tasks. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
[Bibtex]
@inproceedings(2014ALA-Zhan,
author={Yusen Zhan and Anestis Fachantidis and Ioannis Vlahavas and Matthew E. Taylor},
title={{Agents Teaching Humans in Reinforcement Learning Tasks}},
booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
month={May},
year= {2014},
bib2html_pubtype={Refereed Workshop or Symposium},
bib2html_rescat={Reinforcement Learning},
)

In the News:

Publications

2017

  • Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. Scalable Lifelong Reinforcement Learning. Pattern Recognition, 72:407-418, 2017.
    [BibTeX] [Abstract] [Download PDF] [DOI]

    Lifelong reinforcement learning provides a successful framework for agents to learn multiple consecutive tasks sequentially. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this paper, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange. We then show an improvement to reach a \emph{linear convergence rate} compared to current lifelong policy search methods. Finally, we evaluate our technique on a set of benchmark dynamical systems and demonstrate learning speed-ups and reduced running times.

    @article{2017PatternRecognition-Zhan,
    author={Zhan, Yusen and Bou Ammar, Haitham and Taylor, Matthew E.},
    title={{Scalable Lifelong Reinforcement Learning}},
    journal={{Pattern Recognition}},
    year={2017},
    issn={0031-3203},
    volume={72},
    pages={407 - 418},
    doi={http://dx.doi.org/10.1016/j.patcog.2017.07.031},
    url={http://www.sciencedirect.com/science/article/pii/S0031320317303023},
    keywords={Reinforcement learning},
    keywords={Lifelong learning},
    keywords={Distributed optimization},
    keywords={Transfer learning},
    abstract={Lifelong reinforcement learning provides a successful framework for agents to learn multiple consecutive tasks sequentially. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks.
    In this paper, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange. We then show an improvement to reach a \emph{linear convergence rate} compared to current lifelong policy search methods. Finally, we evaluate our technique on a set of benchmark dynamical systems and demonstrate learning speed-ups and reduced running times.}
    }

  • Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. Non-convex Policy Search Using Variational Inequalities. Neural Computation, 29(10):2800-2824, 2017.
    [BibTeX] [Abstract] [Download PDF] [DOI]

    Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have shown to be successful in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters potentially damaging hardware units. Motivated by such constraints, projection based methods are proposed for safe policies. These methods, however, can only handle convex policy constraints. In this paper, we propose the first safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing, for the first time, a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above problems and prove convergence in the non-convex stochastic setting. Finally, we demonstrate the performance of the above algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

    @article{2017NeuralComputation-Zhan,
    author={Zhan, Yusen and Bou Ammar, Haitham and Taylor, Matthew E.},
    title={{Non-convex Policy Search Using Variational Inequalities}},
    journal={{Neural Computation}},
    volume={29},
    number={10},
    pages={2800 - 2824},
    year={2017},
    doi={http://dx.doi.org/10.1162/neco_a_01004},
    abstract={Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have shown to be successful in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters potentially damaging hardware units. Motivated by such constraints, projection based methods are proposed for safe policies.
    These methods, however, can only handle convex policy constraints. In this paper, we propose the first safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing, for the first time, a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above problems and prove convergence in the non-convex stochastic setting. Finally, we demonstrate the performance of the above algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.}
    }

  • Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. An Exploration Strategy Facing Non-Stationary Agents (JAAMAS Extended Abstract). In The 16th International Conference on Autonomous Agents and Multiagent Systems, pages 922-923, Sao Paulo, Brazil, May 2017.
    [BibTeX] [Abstract] [Download PDF]

    The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is sta-tionary and non-strategic. This work investigates how to design exploration strategies in non-stationary and adversarial environments. Our experimental setting uses a two agents strategic interaction scenario, where the opponent switches between different behavioral patterns. The agent’s objective is to learn a model of the opponent’s strategy to act optimally, despite non-determinism and stochasticity. Our contribution is twofold. First, we present drift exploration as a strategy for switch detection. Second, we propose a new algorithm called R-max# that reasons and acts in terms of two objectives: 1) to maximize utilities in the short term while learning and 2) eventually explore implicitly looking for opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity.

    @inproceedings{2017JAAMAS-HernandezLeal-nsa,
    author={Hernandez-Leal, Pablo and Zhan, Yusen and Taylor, Matthew E. and Sucar, L. Enrique and Munoz de Cote, Enrique},
    title={{An Exploration Strategy Facing Non-Stationary Agents ({JAAMAS} Extended Abstract)}},
    booktitle={{The 16th International Conference on Autonomous Agents and Multiagent Systems}},
    month={May},
    year={2017},
    pages={922--923},
    address={Sao Paulo, Brazil},
    abstract={The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is sta-tionary and non-strategic. This work investigates how to design exploration strategies in non-stationary and adversarial environments. Our experimental setting uses a two agents strategic interaction scenario, where the opponent switches between different behavioral patterns. The agent’s objective is to learn a model of the opponent’s strategy to act optimally, despite non-determinism and stochasticity. Our contribution is twofold. First, we present drift exploration as a strategy for switch detection. Second, we propose a new algorithm called R-max# that reasons and acts in terms of two objectives: 1) to maximize utilities in the short term while learning and 2) eventually explore implicitly looking for opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity.}
    }

  • Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. Detecting Switches Against Non-Stationary Opponents (JAAMAS Extended Abstract). In The 16th International Conference on Autonomous Agents and Multiagent Systems, pages 920-921, Sao Paulo, Brazil, May 2017.
    [BibTeX] [Abstract] [Download PDF]

    Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solu-tions on how to act in multiple agent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in this cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This paper introduces DriftER, an algorithm that 1) learns a model of the opponent, 2) uses that to obtain an optimal policy and then 3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results in normal form games and then in a more realistic scenario, the Power TAC simulator.

    @inproceedings{2017JAAMAS-HernandezLeal-nso,
    author = {Hernandez-Leal, Pablo and Zhan, Yusen and Taylor, Matthew E. and Sucar, L. Enrique and Munoz de Cote, Enrique},
    title = {{Detecting Switches Against Non-Stationary Opponents ({JAAMAS} Extended Abstract)}},
    booktitle={{The 16th International Conference on Autonomous Agents and Multiagent Systems}},
    month={May},
    year={2017},
    pages={920--921},
    address={Sao Paulo, Brazil},
    abstract={Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solu-tions on how to act in multiple agent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in this cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This paper introduces DriftER, an algorithm that 1) learns a model of the opponent, 2) uses that to obtain an optimal policy and then 3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results in normal form games and then in a more realistic scenario, the Power TAC simulator.}
    }

2016

  • Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer. In Proceedings of the 25th International Conference on Artificial Intelligence (IJCAI), July 2016. 25% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

    @inproceedings{2016IJCAI-Zhan,
    author={Yusen Zhan and Haitham Bou Ammar and Matthew E. Taylor},
    title={{Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer}},
    booktitle={{Proceedings of the 25th International Conference on Artificial Intelligence ({IJCAI})}},
    month={July},
    year={2016},
    note={25% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    abstract={Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.}
    }

  • Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. Efficiently detecting switches against non-stationary opponents. Autonomous Agents and Multi-Agent Systems, pages 1-23, November 2016.
    [BibTeX] [Abstract] [Download PDF] [DOI]

    Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

    @article{2016JAAMAS2-Hernandez-Leal,
    author={Pablo Hernandez-Leal and Yusen Zhan and Matthew E. Taylor and L. Enrique {Sucar} and Enrique {Munoz de Cote}},
    title={{Efficiently detecting switches against non-stationary opponents}},
    journal={{Autonomous Agents and Multi-Agent Systems}},
    pages={1--23},
    month={November},
    year={2016},
    doi={10.1007/s10458-016-9352-6},
    url={http://dx.doi.org/10.1007/s10458-016-9352-6},
    issn={1387-2532},
    abstract={Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.}
    }

  • Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, Enrique L. Sucar, and Enrique Munoz de Cote. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems, pages 1-32, October 2016.
    [BibTeX] [Abstract] [Download PDF] [DOI]

    The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max{\#} for learning and planning against non-stationary opponent. To handle such opponents, R-max{\#} reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max{\#} is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max{\#} makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

    @article{2016JAAMAS-Hernandez-Leal,
    author={Pablo Hernandez-Leal and Yusen Zhan and Matthew E. Taylor and L. Enrique {Sucar} and Enrique {Munoz de Cote}},
    title={{An exploration strategy for non-stationary opponents}},
    journal={{Autonomous Agents and Multi-Agent Systems}},
    pages={1--32},
    month={October},
    year={2016},
    pages={1--32},
    issn={1573-7454},
    doi={10.1007/s10458-016-9347-3},
    url={http://dx.doi.org/10.1007/s10458-016-9347-3},
    abstract={The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent's objective is to learn a model of the opponent's strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max{\#} for learning and planning against non-stationary opponent. To handle such opponents, R-max{\#} reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max{\#} is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity. R-max{\#} makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.}
    }

2015

  • Yusen Zhan and Matthew E. Taylor. Online Transfer Learning in Reinforcement Learning Domains. In Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (SDMIA), November 2015.
    [BibTeX] [Abstract] [Download PDF]

    This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

    @inproceedings{2015SDMIA-Zhan,
    author={Yusen Zhan and Matthew E. Taylor},
    title={{Online Transfer Learning in Reinforcement Learning Domains}},
    booktitle={{Proceedings of the {AAAI} Fall Symposium on Sequential Decision Making for Intelligent Agents ({SDMIA})}},
    month={November},
    year={2015},
    bib2html_pubtype={Refereed Workshop or Symposium},
    abstract={This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.}
    }

2014

  • Chris HolmesParker, Matthew E. Taylor, Yusen Zhan, and Kagan Tumer. Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
    [BibTeX] [Download PDF]
    @inproceedings(2014ALA-HolmesParker,
    author={Chris HolmesParker and Matthew E. Taylor and Yusen Zhan and Kagan Tumer},
    title={{Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems}},
    booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
    month={May},
    year= {2014},
    bib2html_pubtype={Refereed Workshop or Symposium},
    bib2html_rescat={Reinforcement Learning},
    )

  • Yusen Zhan, Anestis Fachantidis, Ioannis Vlahavas, and Matthew E. Taylor. Agents Teaching Humans in Reinforcement Learning Tasks. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS), May 2014.
    [BibTeX] [Download PDF]
    @inproceedings(2014ALA-Zhan,
    author={Yusen Zhan and Anestis Fachantidis and Ioannis Vlahavas and Matthew E. Taylor},
    title={{Agents Teaching Humans in Reinforcement Learning Tasks}},
    booktitle={{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS})}},
    month={May},
    year= {2014},
    bib2html_pubtype={Refereed Workshop or Symposium},
    bib2html_rescat={Reinforcement Learning},
    )

2013

  • Yusen Zhan, Jun Wu, Chongjun Wang, Meilin Liu, and Junyuan Xie. On the Complexity of Undominated Core and Farsighted Solution Concepts in Coalitional Games. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’13, pages 1177-1178, Richland, SC, 2013. International Foundation for Autonomous Agents and Multiagent Systems.
    [BibTeX] [Download PDF]
    @inproceedings{2013complexity-zhan,
    author={Yusen Zhan and Jun Wu and Chongjun Wang and Meilin Liu and Junyuan Xie},
    title={{On the Complexity of Undominated Core and Farsighted Solution Concepts in Coalitional Games}},
    booktitle={{Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems}},
    series={AAMAS '13},
    year={2013},
    isbn={978-1-4503-1993-5},
    location={St. Paul, MN, USA},
    pages={1177--1178},
    numpages={2},
    url={http://dl.acm.org/citation.cfm?id=2484920.2485130},
    acmid={2485130},
    publisher={International Foundation for Autonomous Agents and Multiagent Systems},
    address={Richland, SC},
    keywords={coalition formation, computational complexity, coordination, game theory (cooperative and non-cooperative), teamwork},
    }

2012

  • Yusen Zhan, Jun Wu, Chongjun Wang, and Junyuan Xie. On the Complexity and Algorithms of Coalition Structure Generation in Overlapping Coalition Formation Games. In Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on, volume 1, pages 868-873. IEEE,, 2012.
    [BibTeX] [DOI]
    @inproceedings{zhan2012complexity,
    title={{On the Complexity and Algorithms of Coalition Structure Generation in Overlapping Coalition Formation Games}},
    author={Yusen Zhan and Jun Wu and Chongjun Wang and Junyuan Xie},
    booktitle={{Tools with Artificial Intelligence ({ICTAI}), 2012 {IEEE} 24th International Conference on}},
    volume={1},
    pages={868--873},
    year={2012},
    doi={10.1109/ICTAI.2012.121},
    organization={IEEE}
    }