Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples. Keywords Markov decision processes · Expected total reward · Unbounded return · Weak convergence of measure Mathematics Subject Classification 90C40 · 60J05 1 Introduction In this paper, our objective is to provide sufficient conditions for the existence of optimal strategies in dynamic programming decision models under the expected total reward criterion. The model under consideration is rather general since the reward function may be unbounded both from above and below and the action sets available at each step to the decision maker may not be necessarily compact. A. Genadot alexandre.genadot@math.u-bordeaux.fr F. Dufour francois.dufour@math.u-bordeaux.fr Institut Polytechnique de Bordeaux, INRIA Bordeaux Sud Ouest, Team: CQFD, IMB, Institut de Mathématiques http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Mathematics and Optimization Springer Journals

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Applied Mathematics and Optimization , Volume OnlineFirst – Oct 23, 2018

Loading next page...
 
/lp/springer-journals/on-the-expected-total-reward-with-unbounded-returns-for-markov-syJ0kVzwL2

References (19)

Publisher
Springer Journals
Copyright
Copyright © 2018 by Springer Science+Business Media, LLC, part of Springer Nature
Subject
Mathematics; Calculus of Variations and Optimal Control; Optimization; Systems Theory, Control; Theoretical, Mathematical and Computational Physics; Mathematical Methods in Physics; Numerical and Computational Physics, Simulation
ISSN
0095-4616
eISSN
1432-0606
DOI
10.1007/s00245-018-9533-6
Publisher site
See Article on Publisher Site

Abstract

We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples. Keywords Markov decision processes · Expected total reward · Unbounded return · Weak convergence of measure Mathematics Subject Classification 90C40 · 60J05 1 Introduction In this paper, our objective is to provide sufficient conditions for the existence of optimal strategies in dynamic programming decision models under the expected total reward criterion. The model under consideration is rather general since the reward function may be unbounded both from above and below and the action sets available at each step to the decision maker may not be necessarily compact. A. Genadot alexandre.genadot@math.u-bordeaux.fr F. Dufour francois.dufour@math.u-bordeaux.fr Institut Polytechnique de Bordeaux, INRIA Bordeaux Sud Ouest, Team: CQFD, IMB, Institut de Mathématiques

Journal

Applied Mathematics and OptimizationSpringer Journals

Published: Oct 23, 2018

There are no references for this article.