Access the full text.
Sign up today, get DeepDyve free for 14 days.
P. Abbeel, M. Quigley, A. Y. Ng (2006)
Proceedings of the 23rd international conference on machine learning
M. Ghavamzadeh, Y. Engel (2006)
Bayesian Policy Gradient Algorithms
K. Tsuchiya, S. Aoi, K. Tsujita (2003)
Locomotion control of a biped locomotion robot using nonlinear oscillatorsProceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), 2
R. Sutton, David McAllester, Satinder Singh, Y. Mansour (1999)
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Alex Smola, P. Bartlett (2000)
Sparse Greedy Gaussian Process Regression
S. Ounpraseuth (2008)
Gaussian Processes for Machine LearningJournal of the American Statistical Association, 103
T. Matsubara, J. Morimoto, J. Nakanishi, M. Sato, K. Doya (2006)
Learning CPG-based biped locomotion with a policy gradient methodRobotics and Autonomous Systems, 54
T. Jaakkola, Satinder Singh, Michael Jordan (1994)
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
S. Kajita, T. Nagasaki, K. Kaneko, H. Hirukawa (2007)
ZMP-based biped running controlRobotics and Automation Magazine, IEEE, 14
J. Candela, C. Rasmussen (2005)
A Unifying View of Sparse Approximate Gaussian Process RegressionJ. Mach. Learn. Res., 6
B. Kendall (2001)
Nonlinear Dynamics and Chaos
S. Kajita, T. Nagasaki, K. Kaneko, H. Hirukawa (2007)
ZMP-Based Biped Running ControlIEEE Robotics & Automation Magazine, 14
Nicolas Meuleau, L. Peshkin, Kee-Eung Kim (2001)
Exploration in Gradient-Based Reinforcement Learning
T. McGeer (1990)
Passive Dynamic WalkingThe International Journal of Robotics Research, 9
C. Atkeson (1997)
Nonparametric Model-Based Reinforcement Learning
F. Fesquet, F. Kronowetter, M. Renger, W. Yam, S. Gandorfer, K. Inomata, Y. Nakamura, A. Marx, R. Gross, K. Fedorov (1987)
DEMONSTRATIONBritish Journal of Pharmacology, 90
Katie Byl, Russ Tedrake (2008)
Metastable Walking on Stochastically Rough Terrain, 04
Russ Tedrake, T. Zhang, H. Seung (2004)
Stochastic policy gradient reinforcement learning on a simple 3D biped2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), 3
J. Morimoto, C. G. Atkeson (2007)
Learning biped locomotion: application of Poincaré-map-based reinforcement learningIEEE Robotics and Automation Magazine, 14
S. Hyon, Joshua Hale, G. Cheng (2007)
Full-Body Compliant Human–Humanoid Interaction: Balancing in the Presence of Unknown External ForcesIEEE Transactions on Robotics, 23
C. Rasmussen, M. Kuss (2003)
Gaussian Processes in Reinforcement Learning
(2007)
ZMPbased biped running control. Robotics and Automation Magazine
Jan Peters, S. Schaal (2008)
Reinforcement learning of motor skills with policy gradientsNeural networks : the official journal of the International Neural Network Society, 21 4
P. Abbeel, M. Quigley, A. Ng (2006)
Using inaccurate models in reinforcement learningProceedings of the 23rd international conference on Machine learning
R. Dearden, N. Friedman, D. Andre (1999)
Model based Bayesian Exploration
T. Sugihara, Yoshihiko Nakamura (2003)
Whole-body Cooperative COG Control through ZMP Manipulation for Humanoid Robots
M. Howard, Stefan Klanke, M. Gienger, C. Goerick, S. Vijayakumar (2009)
A novel method for learning policies from variable constraint dataAutonomous Robots, 27
D. Touretzky, M. Mozer, M. Hasselmo, RegressionChristopher, I. K., WilliamsNeural, GroupAston, UniversityBirmingham (1996)
In Advances in Neural Information Processing Systems
Jonathan Ko, D. Fox (2008)
GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation modelsAutonomous Robots, 27
Jan Peters, S. Vijayakumar, S. Schaal (2003)
Natural Actor-CriticNeurocomputing, 71
Vijay Konda, J. Tsitsiklis (2003)
OnActor-Critic AlgorithmsSIAM J. Control. Optim., 42
F. Miyazaki, S. Arimoto (1981)
Implementation of a Hierarchical Control for Biped LocomotionIFAC Proceedings Volumes, 14
Edward Snelson, Zoubin Ghahramani (2005)
Sparse Gaussian Processes using Pseudo-inputs
J. Morimoto, C. Atkeson (2007)
Learning Biped LocomotionIEEE Robotics & Automation Magazine, 14
E. Westervelt, Gabriel Buche, J. Grizzle (2004)
Experimental Validation of a Framework for the Design of Controllers that Induce Stable Walking in Planar BipedsThe International Journal of Robotics Research, 23
G. Endo, J. Morimoto, Takamitsu Matsubara, J. Nakanishi, G. Cheng (2005)
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid RobotThe International Journal of Robotics Research, 27
S. Kakade (2001)
A Natural Policy Gradient
A. Shiriaev, A. Robertsson, J. Perram, Anders Sandberg (2005)
Periodic Motion Planning for Virtually Constrained (Hybrid) Mechanical SystemsProceedings of the 44th IEEE Conference on Decision and Control
Radford Neal (2006)
Pattern Recognition and Machine LearningPattern Recognition and Machine Learning
R. S. Sutton, A. G. Barto (1998)
Reinforcement learning: an introduction
J. Morimoto, G. Endo, J. Nakanishi, G. Cheng (2008)
A Biologically Inspired Biped Locomotion Strategy for Humanoid Robots: Modulation of Sinusoidal Patterns by a Coupled Oscillator ModelIEEE Transactions on Robotics, 24
L. Baird, A. Moore (1998)
Gradient Descent for General Reinforcement Learning
K. Nagasaka, Y. Kuroki, Shin'ya Suzuki, Y. Itoh, J. Yamaguchi (2004)
Integrated motion control for walking, jumping and running on a small bipedal entertainment robotIEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, 4
K. Hirai, M. Hirose, Yuji Haikawa, T. Takenaka (1998)
The development of Honda humanoid robotProceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146), 2
Leonid Kuvayev, R. Sutton (1996)
Model-Based Reinforcement Learning with an Approximate, Learned Model
J. Morimoto, K. Doya (2000)
Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learningRobotics Auton. Syst., 36
J. Morimoto, G. Endo, J. Nakanishi, S. Hyon, G. Cheng, Darrin Bentivegna, C. Atkeson (2006)
Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walkingProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.
T. Sugihara, Yoshihiko Nakamura (2005)
A Fast Online Gait Planning with Boundary Condition Relaxation for Humanoid RobotsProceedings of the 2005 IEEE International Conference on Robotics and Automation
Martin Riedmiller, T. Gabel, Roland Hafner, S. Lange (2009)
Reinforcement learning for robot soccerAutonomous Robots, 27
H. Benbrahim, J. Franklin (1997)
Biped dynamic walking using reinforcement learningRobotics Auton. Syst., 22
C. G. Atkeson, S. Schaal (1997)
Proc. 14th international conference on machine learning
Richard Linde (1999)
Passive bipedal walking with phasic muscle contractionBiological Cybernetics, 81
Vijay Konda, J. Tsitsiklis (1999)
Actor-Critic Algorithms
K. Nagasaka (1999)
Stabilization of Dynamic Walk on a Humanoid Using Torso Position Compliance Control
H. Kimura, S. Kobayashi (1998)
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function
K. Doya (2000)
Reinforcement Learning in Continuous Time and SpaceNeural Computation, 12
W. Liu, Sanjiang Li, Jochen Renz (2003)
Covariant Policy Search
H. Miura (1983)
Dynamical walk of biped locomotion
R. Dearden, N. Friedman, D. Andre (1999)
Proceedings of fifteenth conference on uncertainty in artificial intelligence
H. Miura, I. Shimoyama (1984)
Dynamic Walk of a BipedThe International Journal of Robotics Research, 3
We propose approximating a Poincaré map of biped walking dynamics using Gaussian processes. We locally optimize parameters of a given biped walking controller based on the approximated Poincaré map. By using Gaussian processes, we can estimate a probability distribution of a target nonlinear function with a given covariance. Thus, an optimization method can take the uncertainty of approximated maps into account throughout the learning process. We use a reinforcement learning (RL) method as the optimization method. Although RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems due to the large number of iterations required to acquire suitable policies. In this study, we first approximated the Poincaré map by using data from a real robot, and then applied RL using the estimated map in order to optimize stepping and walking policies. We show that we can improve stepping and walking policies both in simulated and real environments. Experimental validation on a humanoid robot of the approach is presented.
Autonomous Robots – Springer Journals
Published: Sep 1, 2009
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.