Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spectral Statistics of the Sample Covariance Matrix for High Dimensional Linear Gaussians (2312.05794v1)

Published 10 Dec 2023 in math.ST, cs.LG, cs.SY, eess.SY, math.PR, stat.ML, and stat.TH

Abstract: Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $\rho(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}{})=\sum_{i=0}{N-1}x_{i}x_{i}{}$ and singular values of $EX_{-}{*}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. Negative moments requires sharp estimates on all the eigenvalues $\lambda_{1}\big(X_{-}X_{-}{*}\big) \geq \ldots \geq \lambda_{n}\big(X_{-}X_{-}{*}\big) \geq 0$. Leveraging upon recent results on spectral theorem for non-Hermitian operators in \cite{naeem2023spectral}, along with concentration of measure phenomenon and perturbation theory(Gershgorins' and Cauchys' interlacing theorem) we show that only when $A=A{*}$, typical order of $\lambda_{j}\big(X_{-}X_{-}{*}\big) \in \big[N-n\sqrt{N}, N+n\sqrt{N}\big]$ for all $j \in [n]$. However, in \emph{high dimensions} when $A$ has only one distinct eigenvalue $\lambda$ with geometric multiplicity of one, then as soon as eigenvalue leaves \emph{complex half unit disc}, largest eigenvalue suffers from curse of dimensionality: $\lambda_{1}\big(X_{-}X_{-}{*}\big)=\Omega\big( \lfloor\frac{N}{n}\rfloor e{\alpha_{\lambda}n} \big)$, while smallest eigenvalue $\lambda_{n}\big(X_{-}X_{-}{*}\big) \in (0, N+\sqrt{N}]$. Consequently, OLS estimator incurs a \emph{phase transition} and becomes \emph{transient: increasing iteration only worsens estimation error}, all of this happening when the dynamics are generated from stable systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Sheldon Axler. Down with determinants! The American mathematical monthly, 102(2):139–154, 1995.
  2. Sheldon Axler. Linear algebra done right. Springer Science & Business Media, 1997.
  3. Concentration inequalities on product spaces with applications to markov processes. arXiv preprint math/0505536, 2005.
  4. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. Annals of Probability, 32(3B):2702–2732, 2004.
  5. High dimensional geometry and limitations in system identification. arXiv preprint arXiv:2305.12083, 2023.
  6. From spectral theorem to statistical independence with application to system identification. arXiv preprint arXiv:2310.10523, 2023.
  7. Least squares regression with markovian data: Fundamental limits and algorithms. Advances in neural information processing systems, 33:16666–16676, 2020.
  8. Revisiting ho–kalman-based system identification: Robustness and finite-sample analysis. IEEE Transactions on Automatic Control, 67(4):1914–1928, 2021.
  9. Mark Rudelson. Recent developments in non-asymptotic theory of random matrices. Modern aspects of random matrix theory, 72:83, 2014.
  10. Near optimal finite time identification of arbitrary linear dynamical systems. In International Conference on Machine Learning, pages 5610–5618. PMLR, 2019.
  11. Finite-time system identification for partially observed lti systems of unknown order. arXiv preprint arXiv:1902.01848, 2019.
  12. Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473, 2018.
  13. Michel Talagrand. Transportation cost for gaussian and other product measures. Geometric & Functional Analysis GAFA, 6(3):587–600, 1996.
  14. Random matrices: Universality of esds and the circular law. 2010.
  15. Linear systems can be hard to learn. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 2903–2910. IEEE, 2021.
  16. Online learning of the kalman filter with logarithmic regret. IEEE Transactions on Automatic Control, 2022.
  17. The condition number of a randomly perturbed matrix. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 248–255, 2007.

Summary

We haven't generated a summary for this paper yet.