Intelligent Robot Learning Laboratory (IRL Lab) Zhaodong Wang

photo

CONTACT INFORMATION:

Zhaodong Wang
PhD Student, Computer Science
Email: zhaodong.wang@wsu.edu
Office: Dana Hall 3


My Story

My name is Zhaodong Wang. I am a Ph.D. student currently working with Dr. Matthew E. Taylor since 2014. I obtained my bachelor degree of Electrical Engineering from University of Science and Technology of China in 2014.

My Research

My interested researches include Reinforcement Learning, Transfer Learning and Real Robotics. I am mostly motivated by AI and Robotics related techniques changing human’s life.

Current Projects

By: Zhaodong Wang and Matthew E. Taylor

The purpose of this project is to build an intelligent multi-robot system to manage the usage of bins for harvest work in orchard. It is involved with the auto navigation of robots in orchard environment and the cooperation with human pickers. The value of this multi-robot bin managing system is in realizing the autonomous work of robots in tough outdoor environment and improving the harvest efficiency for the agriculture work. [1]

[1] [pdf] Yawei Zhang, Yunxiang Ye, Zhaodong Wang, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. Intelligent In-Orchard Bin-Managing System for Tree Fruit Production. In Proceedings of the Robotics in Agriculture workshop (at ICRA), May 2015.
[Bibtex]
@inproceedings{2015ICRA-Zhang,
author={Yawei Zhang and Yunxiang Ye and Zhaodong Wang and Matthew E. Taylor and Geoffrey A. Hollinger and Qin Zhang},
title={{Intelligent In-Orchard Bin-Managing System for Tree Fruit Production}},
booktitle={{Proceedings of the Robotics in Agriculture workshop (at {ICRA})}},
month={May},
year={2015},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.}
}

By: Zhaodong Wang and Matthew E. Taylor

Many learning methods such as reinforcement learning suffers from a slow beginning especially in complicated domains. The motivation of transfer learning is to use limited prior knowledge to help learning agents bootstrap at the start and thus achieve overall improvements on learning performance. Due to limited quantity or quality of prior knowledge, how to make the transfer more efficient and effective remains an interesting point. [1]

[1] [pdf] Zhaodong Wang and Matthew E. Taylor. Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study. In AAAI 2016 Spring Symposium, March 2016.
[Bibtex]
@inproceedings{2016AAAI-SSS-Wang,
author={Zhaodong Wang and Matthew E. Taylor},
title={{Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study}},
booktitle={{{AAAI} 2016 Spring Symposium}},
month={March},
year={2016},
bib2html_pubtype={Refereed Workshop or Symposium},
abstract={There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent's demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other's internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent's confidence in its representation of the source agent's policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.}
}

Videos & Other Media:

News

Publications

2017

  • James MacGlashan, Mark K. Ho, Robert Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. Interactive Learning from Policy-Dependent Human Feedback. In Proceedings of the 34th International Conferences on Machine Learning (ICML), August 2017. 25% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner’s current policy. We present empirical results that show this assumption to be false—whether human trainers give a positive or negative feedback for a decision is influenced by the learner’s current policy. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.

    @inproceedings{2017ICML-Macglashan,
    author={James MacGlashan and Mark K. Ho and Robert Loftin and Bei Peng and Guan Wang and David L. Roberts and Matthew E. Taylor and Michael L. Littman},
    title={{Interactive Learning from Policy-Dependent Human Feedback}},
    booktitle={{Proceedings of the 34th International Conferences on Machine Learning ({ICML})}},
    month={August},
    year={2017},
    note={25% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning, Human-robot Interaction},
    abstract={This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assumption to be false---whether human trainers give a positive or negative feedback for a decision is influenced by the learner's current policy. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.}
    }

  • Zhaodong Wang and Matthew E. Taylor. Improving Reinforcement Learning with Confidence-Based Demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI), August 2017. 26% acceptance rate
    [BibTeX] [Abstract] [Download PDF]

    Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent’s performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent’s learning algorithm or representation. The target agent then estimates the source agent’s policy and improves upon it. The key contribution of this work is to show that leveraging the target agent’s uncertainty in the source agent’s policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.

    @inproceedings{2017IJCAI-Wang,
    author={Wang, Zhaodong and Taylor, Matthew E.},
    title={{Improving Reinforcement Learning with Confidence-Based Demonstrations}},
    booktitle={{Proceedings of the 26th International Conference on Artificial Intelligence ({IJCAI})}},
    month={August},
    year={2017},
    note={26% acceptance rate},
    bib2html_pubtype={Refereed Conference},
    bib2html_rescat={Reinforcement Learning},
    abstract={Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent's performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent's learning algorithm or representation. The target agent then estimates the source agent's policy and improves upon it. The key contribution of this work is to show that leveraging the target agent's uncertainty in the source agent's policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.}
    }

  • Yunxiang Ye, Zhaodong Wang, Dylan Jones, Long He, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. Bin-Dog: A Robotic Platform for Bin Management in Orchards. Robotics, 6(2), 2017.
    [BibTeX] [Abstract] [Download PDF] [DOI]

    Bin management during apple harvest season is an important activity for orchards. Typically, empty and full bins are handled by tractor-mounted forklifts or bin trailers in two separate trips. In order to simplify this work process and improve work efficiency of bin management, the concept of a robotic bin-dog system is proposed in this study. This system is designed with a “go-over-the-bin” feature, which allows it to drive over bins between tree rows and complete the above process in one trip. To validate this system concept, a prototype and its control and navigation system were designed and built. Field tests were conducted in a commercial orchard to validate its key functionalities in three tasks including headland turning, straight-line tracking between tree rows, and “go-over-the-bin.” Tests of the headland turning showed that bin-dog followed a predefined path to align with an alleyway with lateral and orientation errors of 0.02 m and 1.5°. Tests of straight-line tracking showed that bin-dog could successfully track the alleyway centerline at speeds up to 1.00 m·s−1 with a RMSE offset of 0.07 m. The navigation system also successfully guided the bin-dog to complete the task of go-over-the-bin at a speed of 0.60 m·s−1. The successful validation tests proved that the prototype can achieve all desired functionality.

    @article{2017Robotics-Ye,
    author={Ye, Yunxiang and Wang, Zhaodong and Jones, Dylan and He, Long and Taylor, Matthew E. and Hollinger, Geoffrey A. and Zhang, Qin},
    title={{Bin-Dog: A Robotic Platform for Bin Management in Orchards}},
    journal={{Robotics}},
    volume={6},
    year={2017},
    number={2},
    url={http://www.mdpi.com/2218-6581/6/2/12},
    issn={2218-6581},
    doi={10.3390/robotics6020012},
    abstract={Bin management during apple harvest season is an important activity for orchards. Typically, empty and full bins are handled by tractor-mounted forklifts or bin trailers in two separate trips. In order to simplify this work process and improve work efficiency of bin management, the concept of a robotic bin-dog system is proposed in this study. This system is designed with a “go-over-the-bin” feature, which allows it to drive over bins between tree rows and complete the above process in one trip. To validate this system concept, a prototype and its control and navigation system were designed and built. Field tests were conducted in a commercial orchard to validate its key functionalities in three tasks including headland turning, straight-line tracking between tree rows, and “go-over-the-bin.” Tests of the headland turning showed that bin-dog followed a predefined path to align with an alleyway with lateral and orientation errors of 0.02 m and 1.5°. Tests of straight-line tracking showed that bin-dog could successfully track the alleyway centerline at speeds up to 1.00 m·s−1 with a RMSE offset of 0.07 m. The navigation system also successfully guided the bin-dog to complete the task of go-over-the-bin at a speed of 0.60 m·s−1. The successful validation tests proved that the prototype can achieve all desired functionality.}
    }

2016

  • Zhaodong Wang and Matthew E. Taylor. Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study. In AAAI 2016 Spring Symposium, March 2016.
    [BibTeX] [Abstract] [Download PDF]

    There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent’s demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other’s internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent’s confidence in its representation of the source agent’s policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.

    @inproceedings{2016AAAI-SSS-Wang,
    author={Zhaodong Wang and Matthew E. Taylor},
    title={{Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study}},
    booktitle={{{AAAI} 2016 Spring Symposium}},
    month={March},
    year={2016},
    bib2html_pubtype={Refereed Workshop or Symposium},
    abstract={There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent's demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other's internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent's confidence in its representation of the source agent's policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.}
    }

2015

  • Yawei Zhang, Yunxiang Ye, Zhaodong Wang, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. Intelligent In-Orchard Bin-Managing System for Tree Fruit Production. In Proceedings of the Robotics in Agriculture workshop (at ICRA), May 2015.
    [BibTeX] [Abstract] [Download PDF]

    The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.

    @inproceedings{2015ICRA-Zhang,
    author={Yawei Zhang and Yunxiang Ye and Zhaodong Wang and Matthew E. Taylor and Geoffrey A. Hollinger and Qin Zhang},
    title={{Intelligent In-Orchard Bin-Managing System for Tree Fruit Production}},
    booktitle={{Proceedings of the Robotics in Agriculture workshop (at {ICRA})}},
    month={May},
    year={2015},
    bib2html_pubtype={Refereed Workshop or Symposium},
    abstract={The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.}
    }