A list of possible topics are included here. Students are also
encouraged to present some work related to their research interests
relevant to the course material.
On Markov Chains, Martingales, and
Stochastic Stabilization
- Chapter 13 of Meyn and Tweedie.
- R. L. Tweedie, "Drift conditions and invariant measures for Markov
chains." Stoch. Process Appl., 92.2 (2001): 345-354.
- O. Hernandez-Lerma and J.B. Lasserre JB. Ergodic theorems and
ergodic decomposition for Markov chains. Acta Applicandae
Mathematica. pp:99-119, 1998.
- J. B. Lasserre. Invariant probabilities for Markov chains on a
metric space. Statistics and Probability Letters, vol. 34, pp.
259-265, 1997.
- R. Douc, G. Fort, E.Moulines, and P. Soulier. Practical drift
conditions for subgeometric rates of convergence. Ann. Appl. Probab,
14:1353-1377, 2004.
- S. B. Connor and G. Fort. State-dependent Foster-Lyapunov criteria
for subgeometric convergence of Markov chains. Stoch. Process Appl.,
119:176-4193, 2009.
- G.O. Roberts and J. Rosenthal, General state space Markov chains
and MCMC algorithms, Probability Surveys, 20-71, 2004.
- L. Breiman. The individual ergodic theorem of information theory,
Annals of Mathematical Statistics, pp. 809-811, 1957.
- R. Pemantle and J. S. Rosenthal. Moment conditions for a sequence
with negative drift to be uniformly bounded in l_r. Stochastic
Processes and their Applications, 82(1):143-155, 1999.
- B. Hajek. Hitting-time and occupation-time bounds implied by drift
analysis with applications. Advances in Applied probability, pages
502-525, 1982.
- P. Tuominen and R.L. Tweedie. Subgeometric rates of convergence of
f-ergodic Markov chains. Adv. Annals Appl. Prob., 26(3):775–798,
September 1994.
On Dynamic Programming and Optimization Methods
- C. J. Himmelberg, T. Parthasarathy, and F. S. VanVleck.
Optimal plans for dynamic programming problems. Mathematics of
Operations Research, pp. 390-394, 1976.
- M. Schal. Conditions for optimality in dynamic programming and for
the limit of n-stage optimal policies to be optimal. Z.
Wahrscheinlichkeitsth, pp. 179-296, 1975.
- V. S. Borkar. Average cost dynamic programming equations for
controlled Markov chains with partial observations. SIAM Journal on
Control and Optimization, 39(3):673-681, 2000.
- D. Blackwell. Memoryless Strategies in Finite-Stage Dynamic
Programming. Annals of Math. Statist. Volume 35, Number 2, 863-865,
1964.
- J. B. Lasserre. Sample-path average optimality for Markov control
processes. IEEE Trans. Automatic Contr. 44, pp.
1966--1971.
- C. Striebel. Sufficient statistics in the optimum control of
stochastic systems. J. Math. Anal. Appl., vol. 12, pp. 576-592,
1965.
- K.W. Ross. Randomized and past dependent policies for markov
decision processes with finite action set. Operations Research,
37:474-477, (1989).
- O. Hernandez-Lerma, J. Gonzales-Hernandez, and R. R.
Lopez-Martinez. Constrained average cost Markov control processes in
Borel spaces. SIAM J. Cont. and Opt. v. 42, pp. 442-468, 2003.
-
T. T. Georgiou and A. Lindquist. "The separation principle in
stochastic control, redux." IEEE Transactions on Automatic
Control 58.10 (2013): 2481-2494.
- S.E. Shreve and D.P. Bertsekas. Universally measurable policies in
dynamic programming. Math. Oper. Res., 4(1):15–30, Feb 1979.
On Learning and Approximation
Algorithms for Markov Decision Processes
- E. D. Vecchia and D. M. Silvia Di and J-M. Alain. Illustrated
review of convergence conditions of the value iteration algorithm
and the rolling horizon procedure for average-cost MDPs. Annals of
Operations Research pp: 193-214, 2012.
- H. Langen. Convergence of dynamic programming models. Math. Oper.
Res., 6(4):493–512, 1981.
- B.V. Roy. Performance loss bounds for approximate value iteration
with state aggregation. Math. Oper. Res., 31(2):234–244, May 2006.
- O. Hernandez-Lerma and J.B Lasserre: Error bounds for rolling
horizon policies in discrete-time Markov control processes. IEEE
Transactions on Automatic Control 35, 1118-1124 (1990)
- D. P. Bertsekas. A new value iteration method for the average cost
dynamic programming problem. SIAM journal on control and
optimization 36.2 (1998): 742-759.
- J. N. Tsitsiklis. Asynchronous Stochastic Approximation and
Q-learning. Machine Learning, 16, 1994, pp. 185-202.
- T. Jaakkola, M. I. Jordan and S. P. Singh. On the convergence of
stochastic iterative dynamic programming algorithms. Neural
computation, 6(6), 1185-1201, 1994.
- E. Even-Dar and Y. Mansour. Learning rates for Q-learning. The
Journal of Machine Learning Research, 5:1-25, 2004.
- V. S. Borkar and S. P. Meyn. The ODE method for convergence of
stochastic approximation and reinforcement learning. SIAM Journal on
Control and Optimization, pp. 447-469, 2000.
- S. Mannor and J. N. Tsitsiklis. On the empirical state-action
frequencies in Markov decision processes under general policies.
Mathematics of Operations Research, 30(3):545, 2005.
- J. N. Tsitsiklis and B. Van Roy. Average Cost Temporal-Difference
Learning. Proceedings of the IEEE Conference on Decision and
Control, 1997.
- G, Neu, A. György, C. Szepesvári, and A. Antos. Online Markov
Decision Processes under Bandit Feedback. IEEE Transactions on
Automatic Control, 59, pp. 676--691, 2013.
- B. Hajek. A tutorial survey of theory and applications of
simulated annealing. Conference on Decision and Control, 1985.
- B. Hajek. Cooling schedules for optimal annealing. Math.
Operations Research, 1988.
-H. S Chang and S. I. Marcus. Approximate receding horizon approach
for Markov decision processes: average reward case. Journal of
Mathematical Analysis and Applications pp. 636-651 (2003)
- O. Hernández-Lerma.
Finite-state approximations for denumerable multidimensional state
discounted Markov decision processes. J. Math. Anal. Appl., vol.
113, pp. 382-388, 1986.
- L. Sennott. The computation of average optimal policies in
denumerable state Markov decision chains. Advances in Applied
Probability, pp. 114-137,
1997
- D. White. Finite-state approximations for denumerable state
infinite horizon discounted Markov decision processes. J. Math.
Anal. Appl., vol. 74, pp. 292-295, 1980.
- F. Dufuor, T. Prieto-Rumeau. Approximation of average cost Markov
decision processes using empirical distributions and concentration
inequalities. Stochastics, (2014), pp. 1-35.
- E. Zhou, M. C. Fu, and S. I. Marcus. Solving continuous-state
POMDPs via density projection. IEEE Transactions on Automatic
Control, 55(5):1101–1116, 2010.
- S. Geman and D. Geman, "Stochastic Relaxation, Gibbs
Distributions, and the Bayesian Restoration of Images," Readings in
Computer Vision, pp. 564-584, 1987.
On Filtering Problems and Applications
- A. Budhiraja, and D. Ocone. Exponential stability of discrete-time
filters for bounded observation noise. Systems & Control Letters,
vol. 30, pp. 185-193, 1997.
- Z. Khan, T. Balch, and Frank Dellaert. MCMC-based particle
filtering for tracking a variable number of interacting targets.
IEEE Pattern Analysis and Machine Intelligence, vol. 27, pp. 1805-1819,
2005.
- A. Doucet, N. J. Gordon, and V. Krishnamurthy. Particle filters
for state estimation of jump Markov linear systems. IEEE Trans.
Signal Processing, vol. 49 pp. 613-624, 2001.
- R. Van Handel.
Discrete time nonlinear filters with informative observations are
stable. Electronic Commun. Probability, vol. 13 pp: 562-575, 2008.
- P. Chigansky, R. Liptser R, and R. van Handel. Intrinsic methods
in filter stability. Handbook of Nonlinear Filtering, 2009.
- D. Crisan and A. Doucet. A survey of convergence results on
particle filtering methods for practitioners. IEEE Transactions on
Signal Processing 50.3 pp: 736-746, 2002.
Decentralized Stochastic Systems:
Teams and Games
- H. S. Witsenhausen. Separation of estimation and control for
discrete time systems. Proc. IEEE, 59, 1971
- H. S. Witsenhausen. A counterexample in stochastic optimum
control. SIAM Journal on Control and Optimization vol.
6, pp. 131-147, 1968.
- N. Sandell and M. Athans. Solution of some nonclassical LQG
stochastic decision problems. IEEE Trans. Automatic Control ,
vol. 19, pp. 108-116, 1974.
- H. S. Witsenhausen. Equivalent stochastic control problems.
Mathematics of Control, Signals, and Systems , vol. 1, pp. 3-11,
Springer-Verlag, 1988.
- H. S. Witsenhausen. A standard form for sequential stochastic
control. Math. Syst. Theor. 5-11 (1973).
- D. Teneketzis, On information structures and nonsequential
stochastic control, CWI Quart., 9 (1996), pp. 241–260.
- R. Radner. Team decision problems,'' Annals of Mathematical
Statistics , vol. 33, pp. 857 - 881, 1962.
- J.C. Krainak and S.I. Marcus. Static team problems - part I:
Sufficient conditions and the exponential cost criterion. IEEE
Trans. Automat. Contr., vol. 27, pp. 839-848, April 1982.
- Chapter 5 of the dissertation of J.N. Tsitsiklis. Problems in
Decentralized Decision Making and Computation" Ph.D. Thesis, EECS,
MIT, 1984
- J. S. Shamma and G. Arslan. Dynamic fictitious play, dynamic
gradient play, and distributed convergence to Nash equilibria. IEEE
Trans. Automat. Control, vol. 50, no. 3, pp. 312-327, Mar. 2005.
- J. N. Tsitsiklis and M. Athans. Convergence and asymptotic
agreement in distributed decision problems. IEEE Trans Autom.
Control, vol. AC-29, no. 1, pp. 42-50, Jan. 1984.
- V. Borkar and P. Varaiya, Asymptotic agreement in distributed
estimation. IEEE Trans. Automar. Contr. pp. 650-655, June 1982.
- L.T. Nielsen. Common knowledge, communication, and convergence of
beliefs. Math. Sot. Sci. 8, 1-14, (1984).
- O. Gossner. Simple Bounds on the Value of a Reputation."
Econometrica 79.5 (2011): 1627-1641.
- W. Sandholm. Stochastic Evolutionary Game Dynamics: Foundations,
Deterministic Approximation, and Equilibrium Selection". In Evolutionary
Game Dynamics, K. Sigmund, ed., 111-141, American Mathematical
Society, 2011.
- E. Kalai and E. Lehrer. Weak and strong merging of opinions".
Journal of Mathematical Economics, 23:73-100, 1994
- J. Nachbar. Beliefs in repeated games". Econometrica, 73:459-480,
2005.
- J. Nachbar. Learning in Games. in R. A Meyers and P. Kokol, eds.
Computational complexity: theory, techniques, and applications. pp.
1695-1705. Springer New
York, 2012.
-
B. Jovanovic, Boyan, and R. W. Rosenthal. "Anonymous sequential
games. Journal of Mathematical Economics, 1988.
- T. Basar, “Decentralized multicriteria optimization of linear
stochastic systems,” IEEE Transactions on Automatic Control, vol.
23, no. 2, pp. 233–243, April 1978
Robustness in Stochastic Control
- D. Jacobson. Optimal stochastic linear systems with exponential
performance criteria and their relation to deterministic
differential games. IEEE Transactions on Automatic control,
18(2):124-131, 1973.
- P. Dai Pra, L. Meneghini, and W. J. Runggaldier. Connections
between stochastic control and dynamic games. Mathematics of
Control, Signals and Systems, 9(4):303-326, 1996.
- P. Dupuis, M. R. James, and I. Petersen. Robust properties of
risk-sensitive control. Mathematics of Control, Signals and Systems,
13(4):318-332, 2000.
- P. Whittle. A risk-sensitive maximum principle: The case of
imperfect state observation. IEEE Transactions on Automatic
Control, 36(7):793-801, 1991.
- I. Tzortzis I, C.D. Charalambous CD and T. Charalambous.
Dynamic programming subject to total variation distance ambiguity.
SIAM Journal on Control and Optimization. pp. 2040-75, 2015.
On Structural Results for Policies
and Applications in Communications, Networks, Finance
- J. C. Walrand and P. Varaiya. Optimal causal
coding-decoding problems. IEEE Trans. Inform. Theory, 19:814-820,
November 1983 (this will be supplemented by a paper or more,
depending on your group)
- S. Tatikonda, and S. Mitter. The capacity of channels with
feedback. IEEE Trans. Inform. Theory, 323-349, 2009.
- A. Nayyar, T. Basar, D.Teneketzis and V.V. Veeravalli. Optimal
Strategies for Communication and Remote Estimation with an Energy
Harvesting Sensor. IEEE Transactions on Automatic Control, Vol. 58,
No. 9, pp. 2246-2260, September, 2013.
- W. Wu and A. Arapostathis. Optimal sensor querying: General
markovian and LQG models with controlled observations. IEEE
Transactions on Automatic Control, vol. 53, no. 6, pp. 1392 -1405,
July 2008.
- R. Agrawal, M. V. Hegde, and D. Teneketzis. Asymptotically
efficient adaptive allocation rules for the multiarmed bandit
problem with switching cost. IEEE Transactions on Automatic Control,
33:899-906, 1988.
- B. Hajek, K. Mitzel, and S. Yang. Paging and registration in
cellular networks: jointly optimal policies and an iterative
algorithm. IEEE Transactions on Information Theory, 54, 2008).
Continuous-time Stochastic
Control
- First three chapters of 'Stochastic Stability and Control',
by Kushner.
- Chapter 4 of 'Stochastic Stability and Control', by Kushner
- S. P. Meyn and R. L.
Tweedie. Stability of Markovian processes iii: Foster-Lyapunov
criteria for continuous-time processes. Adv. Appl. Probability,
pages 518-548, 1993.
- R. J. Elliott. The optimal control of a stochastic system. SIAM
Journal on Control and Optimization, pp. 756-778, 1977.
- M. Davis and P Varaiya. Dynamic programming conditions for
partially observable stochastic systems. SIAM Journal on Control,
pp.226-61, 1973.
Some Further Applications
- C. Pasin, F. Dufour, L. Villain, H. Zhang, R. Thiebaut.
Controlling IL-7 injections in HIV-infected patients.
arxiv.org/abs/1801.06227
Implementation of Iterative
Algorithms Leading to Convergence to Optimal Policies
- You can also choose to implement policy iteration
algorithms, value iteration algorithms or linear programming
algorithm for a given sample problem as part of your project. You
will be provided with references. Please talk to the instructor if
you wish to implement such an algorithm and present your results in
class. You will need to include your (Matlab) Codes in your report.
The reports should be less than 8
pages and roughly contain the following: 1/4 of the report
should focus on the problem description, known results and a
literature review sufficient enough to reflect your knowledge of
the field;1/2 of the report on the paper's main contributions
and results; you do not need to give details of the proofs but
convince the reader that you have understood the material; and
1/4 of the report should be on your review and critique of the
paper.