Markov decision processes (MDPs) are models of dynamic decision making
under uncertainty. These models arise in diverse applications and
have been developed extensively in fields such as operations
research, control engineering, and the decision sciences in general.
Recent research, especially in artificial intelligence, has
highlighted the significance of studying the computational
properties of MDP problems.
We address computational complexity questions in solving MDP problems
under infinite-horizon optimization criteria. In an
infinite-horizon or an indefinite-horizon criterion, we are
interested in making good decisions for the long run or an
indefinite period of time. A distinction that partitions the thesis
is the assumption on the observability of the state of the system
with which the decision maker interacts.
In the first half of the thesis, we focus on partially observable or
POMDP problems, wherein there is uncertainty about the current state
of the system at times of decision. We show that many POMDP
problems are undecidable. We make use of an undecidability result
in research on probabilistic finite automata to establish our
results.
In the second half of the thesis, we focus on fully observable MDP
problems wherein the state of the system at each time point is
known. These MDPs are solvable in polynomial time because they can
be expressed as special linear programs. However, preferred
techniques for solving them are simple dynamic programming value and
policy iteration methods. We analyze these algorithms on restricted
classes of problems, including deterministic MDPs and MDPs whose
linear programming formulations take two variables per inequality,
and establish that several algorithms based on value or policy
iteration are polynomial. Tools of analysis include studying
parametric versions of the problems and viewing policy iteration as
a Newton's method for finding the zero of a function. We expect
that extending the analysis of policy iteration as a Newton's method
will establish policy iteration to be polynomial on general fully
observable MDPs.