TY - JOUR
T1 - Technical note: A computationally efficient algorithm for undiscounted Markov Decision processes with restricted observations
AU - Davis, Lauren
AU - Hodgson, Thom J.
AU - King, Russell E.
AU - Wei, Wenbin
PY - 2009/2/1
Y1 - 2009/2/1
N2 - We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311-328. © 2008 Wiley Periodicals, Inc.
AB - We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311-328. © 2008 Wiley Periodicals, Inc.
KW - Heuristics
KW - Markov Decision process
KW - Optimal control
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60349085277&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=60349085277&origin=inward
U2 - 10.1002/nav.20329
DO - 10.1002/nav.20329
M3 - Article
SN - 0894-069X
VL - 56
SP - 86
EP - 92
JO - Naval Research Logistics (NRL)
JF - Naval Research Logistics (NRL)
IS - 1
ER -