On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

F. Dufour; A. Genadot

doi:10.1007/s00245-018-9533-6

Loading next page...

References (19)

D. Bertsekas, S. Shreve (2007)
Stochastic optimal control : the discrete time case
A. Jaśkiewicz, A. Nowak (2011)
Discounted dynamic programming with unbounded returns: Application to economic models
Journal of Mathematical Analysis and Applications, 378
(1979)
On dynamic programming and statistical desision theory
M. Schäl (1975)
On dynamic programming: Compactness of the space of policies
Stochastic Processes and their Applications, 3
E. Balder (1992)
Existence Without Explicit Compactness in Stochastic Dynamic Programming
Math. Oper. Res., 17
R. Kertz, David Nachman (1979)
Persistently Optimal Plans for Nonstationary Dynamic Programming: The Topology of Weak Convergence Case
Annals of Probability, 7
E. Balder (1989)
On compactness of the space of policies in stochastic dynamic programming
Stochastic Processes and their Applications, 32
M Schäl (1975)
Conditions for optimality in dynamic programming and for the limit of $$n$$ n -stage optimal policies to be optimal
Z. Wahrscheinlichkeitstheorie Verw. Geb., 32
A. Zapała (2008)
Unbounded mappings and weak convergence of measures
Statistics & Probability Letters, 78
J. Wessels (1976)
Markov programming by successive approximations by respect to weighted supremum norms
Advances in Applied Probability, 8
A. Nowak (1988)
On the weak topology on a space of probability measures induced by policies
Bulletin of The Polish Academy of Sciences Mathematics, 36
J. Matkowski, A. Nowak (2011)
On discounted dynamic programming with unbounded returns
Economic Theory, 46
K. Hinderer (1970)
Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter
A. Jaśkiewicz, A. Nowak (2011)
Stochastic Games with Unbounded Payoffs: Applications to Robust Control in Economics
Dynamic Games and Applications, 1
J Wessels (1977)
Markov programming by successive approximations with respect to weighted supremum norms
J. Math. Anal. Appl., 58
(2007)
Measure Theory, vol
VI Bogachev (2007)
Measure Theory
A. Jaśkiewicz, J. Matkowski, A. Nowak (2014)
Generalised discounting in dynamic programming with unbounded returns
Oper. Res. Lett., 42
M. Schäl (1975)
Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 32

Publisher: Springer Journals
Copyright: Copyright © 2018 by Springer Science+Business Media, LLC, part of Springer Nature
Subject: Mathematics; Calculus of Variations and Optimal Control; Optimization; Systems Theory, Control; Theoretical, Mathematical and Computational Physics; Mathematical Methods in Physics; Numerical and Computational Physics, Simulation
ISSN: 0095-4616
eISSN: 1432-0606
DOI: 10.1007/s00245-018-9533-6
Publisher site: See Article on Publisher Site

Abstract

We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples. Keywords Markov decision processes · Expected total reward · Unbounded return · Weak convergence of measure Mathematics Subject Classiﬁcation 90C40 · 60J05 1 Introduction In this paper, our objective is to provide sufﬁcient conditions for the existence of optimal strategies in dynamic programming decision models under the expected total reward criterion. The model under consideration is rather general since the reward function may be unbounded both from above and below and the action sets available at each step to the decision maker may not be necessarily compact. A. Genadot alexandre.genadot@math.u-bordeaux.fr F. Dufour francois.dufour@math.u-bordeaux.fr Institut Polytechnique de Bordeaux, INRIA Bordeaux Sud Ouest, Team: CQFD, IMB, Institut de Mathématiques

Journal

Applied Mathematics and Optimization – Springer Journals

Published: Oct 23, 2018

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

References (19)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies