Chaos as an interpretable benchmark for forecasting and data-driven modelling (2110.05266v2)

Published 11 Oct 2021 in cs.LG, eess.SP, and nlin.CD

Abstract: The striking fractal geometry of strange attractors underscores the generative nature of chaos: like probability distributions, chaotic systems can be repeatedly measured to produce arbitrarily-detailed information about the underlying attractor. Chaotic systems thus pose a unique challenge to modern statistical learning techniques, while retaining quantifiable mathematical properties that make them controllable and interpretable as benchmarks. Here, we present a growing database currently comprising 131 known chaotic dynamical systems spanning fields such as astrophysics, climatology, and biochemistry. Each system is paired with precomputed multivariate and univariate time series. Our dataset has comparable scale to existing static time series databases; however, our systems can be re-integrated to produce additional datasets of arbitrary length and granularity. Our dataset is annotated with known mathematical properties of each system, and we perform feature analysis to broadly categorize the diverse dynamics present across the collection. Chaotic systems inherently challenge forecasting models, and across extensive benchmarks we correlate forecasting performance with the degree of chaos present. We also exploit the unique generative properties of our dataset in several proof-of-concept experiments: surrogate transfer learning to improve time series classification, importance sampling to accelerate model training, and benchmarking symbolic regression algorithms.

Citations (60)

View on Semantic Scholar

Summary

The paper introduces a novel dataset of 131 chaotic dynamical systems and benchmarks 16 forecasting models to correlate system chaoticity with forecasting performance.
It demonstrates that modern models like NBEATS and Transformers outperform traditional methods in long-sequence forecasting scenarios.
The study reveals that incorporating mathematical properties such as Lyapunov exponents and entropy enhances the interpretability of data-driven modeling techniques.

Chaos as an Interpretable Benchmark for Forecasting and Data-Driven Modelling

The paper "Chaos as an Interpretable Benchmark for Forecasting and Data-Driven Modelling" presents a comprehensive evaluation of chaotic systems as benchmarks for modern statistical learning techniques. The authors introduce a unique dataset comprising 131 known chaotic dynamical systems drawn from diverse fields such as astrophysics, climatology, and biochemistry. This dataset, annotated with a range of mathematical properties, provides a scalable alternative to existing time series datasets and offers numerous applications in data-driven modeling.

Key Contributions

Chaotic Systems As Benchmarks:
- The dataset includes both multivariate and univariate time series, each associated with mathematical properties such as Lyapunov exponents, entropy, and fractal dimensions. The dataset enables the generation of additional data through re-integration, facilitating varied lengths and granularities.
- The authors correlate the degree of chaos within the systems with forecasting performance across extensive benchmarks. This allows different forecasting techniques to be evaluated in the context of underlying system chaoticity.
Forecasting Experiments:
- Sixteen forecasting models are benchmarked across the chaotic systems dataset. These models include both traditional techniques (e.g., ARIMA) and modern machine learning methods (e.g., Transformers, LSTMs).
- The analysis reveals that models such as NBEATS and Transformers consistently outperform traditional statistical models, especially in long-sequence forecasting scenarios—a finding that challenges existing perceptions of model efficacy.
Data-Driven Modeling Applications:
- Several proof-of-concept experiments utilize the dataset to illustrate novel modeling approaches, including surrogate transfer learning for time series classification, importance sampling to enhance model training speed, and benchmarking symbolic regression algorithms.
- The paper demonstrates that symbolic regression accuracy correlates with certain mathematical properties of chaotic systems, offering insight into the complexity of inferring governing equations from data.
Implications for AI and Machine Learning:
- The dataset improves the interpretability of black-box time series algorithms by providing a systematic way to compare model performance across systems with diverse intrinsic properties.
- These experiments suggest that chaotic systems can serve as robust benchmarks for the development of new forecasting and data modeling methods, offering interpretability improvements over traditional datasets which may not have innate properties such as fractality and chaos.

Implications and Future Directions

The paper highlights the potential of chaotic systems to serve as rich benchmarks for evaluating statistical learning models, particularly in the context of interpretability and the complexity of dynamical behaviors. The findings underscore the importance of including mathematical insight into model evaluation, showcasing the value of understanding the underlying dynamics influencing time series data.

Future directions could include expanding the database to incorporate more chaotic systems, especially those with higher-dimensional chaotic attractors. Additionally, further exploration of applications in control algorithms and neural ordinary differential equations could provide more extensive insights into the capabilities and limitations of AI-driven data modeling techniques.

By systematically capturing and analyzing the generative capabilities of chaotic systems, this research paves the way for advancements in both the theoretical understanding and practical application of complex system modeling and forecasting.