Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Iterative ADP learning algorithms for discrete-time multi-player games

Iterative ADP learning algorithms for discrete-time multi-player games Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton–Jacobi–Isaacs equations for zero-sum games and a set of coupled Hamilton–Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Artificial Intelligence Review Springer Journals

Iterative ADP learning algorithms for discrete-time multi-player games

Artificial Intelligence Review , Volume 50 (1) – Jan 12, 2018

Loading next page...
 
/lp/springer-journals/iterative-adp-learning-algorithms-for-discrete-time-multi-player-games-e2YjkZHEGA

References (48)

Publisher
Springer Journals
Copyright
Copyright © 2018 by Springer Science+Business Media B.V., part of Springer Nature
Subject
Computer Science; Artificial Intelligence (incl. Robotics); Computer Science, general
ISSN
0269-2821
eISSN
1573-7462
DOI
10.1007/s10462-017-9603-1
Publisher site
See Article on Publisher Site

Abstract

Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton–Jacobi–Isaacs equations for zero-sum games and a set of coupled Hamilton–Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.

Journal

Artificial Intelligence ReviewSpringer Journals

Published: Jan 12, 2018

There are no references for this article.