Spectral Statistics of the Sample Covariance Matrix for High Dimensional Linear Gaussians (2312.05794v1)
Abstract: Performance of ordinary least squares(OLS) method for the \emph{estimation of high dimensional stable state transition matrix} $A$(i.e., spectral radius $\rho(A)<1$) from a single noisy observed trajectory of the linear time invariant(LTI)\footnote{Linear Gaussian (LG) in Markov chain literature} system $X_{-}:(x_0,x_1, \ldots,x_{N-1})$ satisfying \begin{equation} x_{t+1}=Ax_{t}+w_{t}, \hspace{10pt} \text{ where } w_{t} \thicksim N(0,I_{n}), \end{equation} heavily rely on negative moments of the sample covariance matrix: $(X_{-}X_{-}{})=\sum_{i=0}{N-1}x_{i}x_{i}{}$ and singular values of $EX_{-}{*}$, where $E$ is a rectangular Gaussian ensemble $E=[w_0, \ldots, w_{N-1}]$. Negative moments requires sharp estimates on all the eigenvalues $\lambda_{1}\big(X_{-}X_{-}{*}\big) \geq \ldots \geq \lambda_{n}\big(X_{-}X_{-}{*}\big) \geq 0$. Leveraging upon recent results on spectral theorem for non-Hermitian operators in \cite{naeem2023spectral}, along with concentration of measure phenomenon and perturbation theory(Gershgorins' and Cauchys' interlacing theorem) we show that only when $A=A{*}$, typical order of $\lambda_{j}\big(X_{-}X_{-}{*}\big) \in \big[N-n\sqrt{N}, N+n\sqrt{N}\big]$ for all $j \in [n]$. However, in \emph{high dimensions} when $A$ has only one distinct eigenvalue $\lambda$ with geometric multiplicity of one, then as soon as eigenvalue leaves \emph{complex half unit disc}, largest eigenvalue suffers from curse of dimensionality: $\lambda_{1}\big(X_{-}X_{-}{*}\big)=\Omega\big( \lfloor\frac{N}{n}\rfloor e{\alpha_{\lambda}n} \big)$, while smallest eigenvalue $\lambda_{n}\big(X_{-}X_{-}{*}\big) \in (0, N+\sqrt{N}]$. Consequently, OLS estimator incurs a \emph{phase transition} and becomes \emph{transient: increasing iteration only worsens estimation error}, all of this happening when the dynamics are generated from stable systems.
- Sheldon Axler. Down with determinants! The American mathematical monthly, 102(2):139–154, 1995.
- Sheldon Axler. Linear algebra done right. Springer Science & Business Media, 1997.
- Concentration inequalities on product spaces with applications to markov processes. arXiv preprint math/0505536, 2005.
- Transportation cost-information inequalities and applications to random dynamical systems and diffusions. Annals of Probability, 32(3B):2702–2732, 2004.
- High dimensional geometry and limitations in system identification. arXiv preprint arXiv:2305.12083, 2023.
- From spectral theorem to statistical independence with application to system identification. arXiv preprint arXiv:2310.10523, 2023.
- Least squares regression with markovian data: Fundamental limits and algorithms. Advances in neural information processing systems, 33:16666–16676, 2020.
- Revisiting ho–kalman-based system identification: Robustness and finite-sample analysis. IEEE Transactions on Automatic Control, 67(4):1914–1928, 2021.
- Mark Rudelson. Recent developments in non-asymptotic theory of random matrices. Modern aspects of random matrix theory, 72:83, 2014.
- Near optimal finite time identification of arbitrary linear dynamical systems. In International Conference on Machine Learning, pages 5610–5618. PMLR, 2019.
- Finite-time system identification for partially observed lti systems of unknown order. arXiv preprint arXiv:1902.01848, 2019.
- Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473, 2018.
- Michel Talagrand. Transportation cost for gaussian and other product measures. Geometric & Functional Analysis GAFA, 6(3):587–600, 1996.
- Random matrices: Universality of esds and the circular law. 2010.
- Linear systems can be hard to learn. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 2903–2910. IEEE, 2021.
- Online learning of the kalman filter with logarithmic regret. IEEE Transactions on Automatic Control, 2022.
- The condition number of a randomly perturbed matrix. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 248–255, 2007.