Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources (2311.12609v4)
Abstract: In the classical lossy source coding problem, one encodes long blocks of source symbols that enables the distortion to approach the ultimate Shannon limit. Such a block-coding approach introduces large delays, which is undesirable in many delay-sensitive applications. We consider the zero-delay case, where the goal is to encode and decode a finite-alphabet Markov source without any delay. It has been shown that this problem lends itself to stochastic control techniques, which lead to existence, structural, and general structural approximation results. However, these techniques so far have resulted only in computationally prohibitive algorithmic implementations for code design. To address this problem, we present a reinforcement learning design algorithm and rigorously prove its asymptotic optimality. In particular, we show that a quantized Q-learning algorithm can be used to obtain a near-optimal coding policy for this problem. The proof builds on recent results on quantized Q-learning for weakly Feller controlled Markov chains whose application necessitates the development of supporting technical results on regularity and stability properties, and relating the optimal solutions for discounted and average cost infinite horizon criteria problems. These theoretical results are supported by simulations.
- S. C. Draper, C. Chang, and A. Sahai, “Lossless coding for distributed streaming sources,” IEEE Transactions on Information Theory, vol. 60, no. 3, pp. 1447–1474, 2014.
- I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: A survey,” Computer Networks, vol. 38, pp. 393–422, 2002.
- H. S. Witsenhausen, “On the structure of real-time source coders,” Bell System Technical Journal, vol. 58, pp. 1437–1451, 1979.
- J. C. Walrand and P. Varaiya, “Causal coding and control of Markov chains,” Systems & Control Letters, vol. 3, pp. 189 – 192, 1983.
- D. Teneketzis, “On the structure of optimal real-time encoders and decoders in noisy communication,” IEEE Transactions on Information Theory, vol. 52, pp. 4017–4035, 2006.
- A. Mahajan and D. Teneketzis, “Optimal design of sequential real-time communication systems,” IEEE Transactions on Information Theory, vol. 55, pp. 5317–5338, 2009.
- S. Yüksel, “On optimal causal coding of partially observed Markov sources in single and multi-terminal settings,” IEEE Transactions on Information Theory, vol. 59, pp. 424–437, 2013.
- T. Linder and S. Yüksel, “On optimal zero-delay quantization of vector Markov sources,” IEEE Transactions on Information Theory, vol. 60, pp. 2975–5991, 2014.
- R. G. Wood, T. Linder, and S. Yüksel, “Optimal zero delay coding of Markov sources: Stationary and finite memory codes,” IEEE Transactions on Information Theory, vol. 63, pp. 5968–5980, 2017.
- S. Tatikonda and S. Mitter, “The capacity of channels with feedback,” IEEE Transactions on Information Theory, vol. 55, no. 1, pp. 323–349, 2009.
- M. Ghomi, T. Linder, and S. Yüksel, “Zero-delay lossy coding of linear vector Markov sources: Optimality of stationary codes and near optimality of finite memory codes,” IEEE Transactions on Information Theory, vol. 68, no. 5, pp. 3474–3488, 2021.
- C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 279–292, 1992.
- J. N. Tsitsiklis, “Asynchronous stochastic approximation and Q-learning,” Machine Learning, vol. 16, pp. 185–202, 1994.
- W. L. Baker, “Learning via stochastic approximation in function space,” Ph.D. dissertation, Harvard University, Cambridge, MA, 1997.
- C. Szepesvári and M. Littman, “A unified analysis of value-function-based reinforcement-learning algorithms.” Neural computation, vol. 11, no. 8, pp. 2017–2060, 1999.
- A. Kara, N. Saldi, and S. Yüksel, “Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity,” Journal of Machine Learning Research, vol. 24, no. 199, pp. 1–34, 2023.
- E. I. Silva, M. S. Derpich, and J. Østergaard, “A framework for control system design subject to average data-rate constraints,” IEEE Transactions on Automatic Control, vol. 56, pp. 1886–1899, August 2011.
- E. Silva, M. Derpich, J. Østergaard, and M. Encina, “A characterization of the minimal average data rate that guarantees a given closed-loop performance level,” IEEE Transactions on Automatic Control, vol. 61, no. 8, pp. 2171–2186, 2015.
- P. Stavrou, J. Østergaard, and C. Charalambous, “Zero-delay rate distortion via filtering for vector-valued Gaussian sources,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 5, pp. 841–856, 2018.
- T. Cuvelier, T. Tanaka, and R. H. Jr., “Time-invariant prefix coding for LQG control,” arXiv preprint arXiv:2204.00588, 2022.
- R. Bansal and T. Başar, “Simultaneous design of measurement and control strategies in stochastic systems with feedback,” Automatica, vol. 45, pp. 679–694, September 1989.
- S. Tatikonda, A. Sahai, and S. Mitter, “Stochastic linear control over a communication channels,” IEEE Transactions on Automatic Control, vol. 49, pp. 1549–1561, September 2004.
- T. Tanaka, K. Kim, P. Parrilo, and S. Mitter, “Semidefinite programming approach to Gaussian sequential rate-distortion trade-offs,” IEEE Transactions on Automatic Control, vol. 62, no. 4, pp. 1896–1910, 2016.
- M. Derpich and J. Østergaard, “Improved upper bounds to the causal quadratic rate-distortion function for Gaussian stationary sources,” IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3131–3152, 2012.
- P. Stavrou and M. Skoglund, “Asymptotic reverse waterfilling algorithm of NRDF for certain classes of vector Gauss-Markov processes,” IEEE Transactions on Automatic Control, vol. 67, no. 6, pp. 3196–3203, 2022.
- P. Stavrou, T. Tanaka, and S. Tatikonda, “The time-invariant multidimensional Gaussian sequential rate-distortion problem revisited,” IEEE Transactions on Automatic Control, vol. 65, no. 5, pp. 2245–2249, 2019.
- V. Kostina and B. Hassibi, “Rate-cost tradeoffs in control,” IEEE Transactions on Automatic Control, vol. 64, no. 11, pp. 4525–4540, 2019.
- D. Pollard, “Quantization and the method of k𝑘kitalic_k-means,” IEEE Transactions on Information Theory, vol. 28, pp. 199–205, 1982.
- T. Linder, G. Lugosi, and K. Zeger, “Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding,” IEEE Transactions on Information Theory, vol. 40, no. 6, pp. 1728–1740, 1994.
- T. Linder, “Learning-theoretic methods in vector quantization,” in Principles of nonparametric learning. Springer, Wien, New York, 2002, pp. 163–210.
- D. Gündüz, P. de Kerret, N. Sidiropoulos, D. Gesbert, C. Murthy, and M. van der Schaar, “Machine learning in the air,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 10, pp. 2184–2199, 2019.
- H. Permuter, P. Cuff, B. Van Roy, and T. Weissman, “Capacity of the trapdoor channel with feedback,” IEEE Transactions on Information Theory, vol. 54, no. 7, pp. 5138-5149, 2014.
- O. Elishco and H. Permuter, “Capacity and coding for the Ising channel with feedback,” IEEE Transactions on Information Theory, vol. 60, no. 9, pp. 3150–3165, 2008.
- Z. Aharoni, D. Tsur, Z. Goldfield, and H. Permuter, “Capacity of continuous channels with memory via directed information neural estimator”, in 2020 IEEE International Symposium on Information Theory, doi: 10.1109/ISIT44484.2020.9174109.
- D. Tsur, Z. Aharoni, Z. Goldfield, and H. Permuter, “Data-driven optimization of directed information over discrete alphabets,” IEEE Transactions on Information Theory, vol. 70, no. 3, pp. 1652–1670, 2024.
- Y. Reznik, “An algorithm for quantization of discrete probability distributions,” DCC 2011, pp. 333–342, 2011.
- S. Yüksel and T. Linder, “Optimization and convergence of observation channels in stochastic control,” SIAM Journal on Control and Optimization, vol. 50, pp. 864–887, 2012.
- A. Kara and S. Yüksel, “Convergence of finite memory Q-learning for POMDPs and near optimality of learned policies under filter stability,” Mathematics of Operations Research, vol. 48, pp. 2066–2093, 2023.
- N. Saldi, S. Yüksel, and T. Linder, “On the asymptotic optimality of finite approximations to Markov decision processes with Borel spaces,” Mathematics of Operations Research, vol. 42, no. 4, pp. 945–978, 2017.
- C. Szepesvari, “The asymptotic convergence-rate of Q-learning,” Advances in Neural Information Processing Systems, vol.10, pp. 1064– 1070, 1998.
- P. Chigansky and R. Liptser, “Stability of nonlinear filters in non-mixing case,” Annals of Applied Probability, vol. 14, pp. 2038–2056, 2004.
- R. van Handel, “The stability of conditional Markov processes and Markov chains in random environments,” Annals of Applied Probability, vol. 37, pp. 1876–1925, 2009.
- L. Stettner, “Ergodic control of partially observed Markov control processes with equivalent transition probabilities,” Applicationes Mathematicae, vol. 22, pp. 25–38, 1993.
- R. M. Dudley. Real Analysis and Probability. 2nd ed. Cambridge: Cambridge University Press, 2002.
- S. Yüksel. (2023) Optimization and control of stochastic systems. [Online]. Available: https://mast.queensu.ca/~yuksel/LectureNotesOnStochasticOptControl.pdf
- R. L. Dobrushin, “Central limit theorem for nonstationary Markov chains. I,” Theory of Probability & Its Applications, vol. 1, no. 1, pp. 65–80, 1956.
- M. Hairer. (2010) Convergence of Markov processes. [Online]. Available: https://www.hairer.org/notes/Convergence.pdf
- E. Even-Dar and Y. Mansour, “Learning rates for Q-learning,” Journal of Machine Learning Research, vol. 5, pp. 1–25, 2004.
- S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.
- W. Kreitmeier, “Optimal vector quantization in terms of Wasserstein distance,” Journal of Multivariate Analysis, vol. 102, no. 8, pp. 1225–1239, 2011.
- N. Saldi, S. Yüksel, and T. Linder, “Finite model approximations for partially observed Markov decision processes with discounted cost,” IEEE Transactions on Automatic Control, vol. 65, 2020.
- J. Kieffer, “Uniqueness of locally optimal quantizer for log-concave density and convex error weighting function,” IEEE Transactions on Information Theory, vol. 29, no. 1, pp. 42–47, 1983.
- M. Schal, “Average optimality in dynamic programming with general state space,” Mathematics of Operations Research, vol. 18, no. 1, pp. 163–172, 1993.
- Liam Cregg (3 papers)
- Tamas Linder (48 papers)
- Serdar Yuksel (25 papers)