Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 52 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Feature-Based Time-Series Analysis in R using the theft Package (2208.06146v4)

Published 12 Aug 2022 in stat.ML, cs.LG, cs.MS, q-bio.QM, stat.AP, and stat.ME

Abstract: Time series are measured and analyzed across the sciences. One method of quantifying the structure of time series is by calculating a set of summary statistics or `features', and then representing a time series in terms of its properties as a feature vector. The resulting feature space is interpretable and informative, and enables conventional statistical learning approaches, including clustering, regression, and classification, to be applied to time-series datasets. Many open-source software packages for computing sets of time-series features exist across multiple programming languages, including catch22 (22 features: Matlab, R, Python, Julia), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (779 features: Python), and TSFEL (390 features: Python). However, there are several issues: (i) a singular access point to these packages is not currently available; (ii) to access all feature sets, users must be fluent in multiple languages; and (iii) these feature-extraction packages lack extensive accompanying methodological pipelines for performing feature-based time-series analysis, such as applications to time-series classification. Here we introduce a solution to these issues in an R software package called theft: Tools for Handling Extraction of Features from Time series. theft is a unified and extendable framework for computing features from the six open-source time-series feature sets listed above. It also includes a suite of functions for processing and interpreting the performance of extracted features, including extensive data-visualization templates, low-dimensional projections, and time-series classification operations. With an increasing volume and complexity of time-series datasets in the sciences and industry, theft provides a standardized framework for comprehensively quantifying and interpreting informative structure in time series.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. “Indications of Nonlinear Deterministic and Finite-Dimensional Structures in Time Series of Brain Electrical Activity: Dependence on Recording Region and Brain State.” Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 64(6 Pt 1), 061907. ISSN 1539-3755. 10.1103/PhysRevE.64.061907.
  2. “TSFEL: Time Series Feature Extraction Library.” SoftwareX, 11, 100456. ISSN 2352-7110. 10.1016/j.softx.2020.100456.
  3. “Classifying Kepler Light Curves for 12,000 A and F Stars Using Supervised Feature-Based Machine Learning.” Monthly Notices of the Royal Astronomical Society, p. stac1515. ISSN 0035-8711. 10.1093/mnras/stac1515.
  4. shiny: Web Application Framework for R. R package version 1.5.0, URL https://CRAN.R-project.org/package=shiny.
  5. “Time Series FeatuRe Extraction on Basis of Scalable Hypothesis Tests (Tsfresh – A Python Package).” Neurocomputing, 307, 72–77. ISSN 0925-2312. 10.1016/j.neucom.2018.03.067.
  6. “Distributed and Parallel Time Series Feature Extraction for Industrial Big Data Applications.” 10.48550/arXiv.1610.07717. 1610.07717.
  7. “STL: A Seasonal-Trend Decomposition Procedure Based on Loess (with Discussion).” Journal of Official Statistics, 6, 3–73.
  8. Day WHE, Edelsbrunner H (1984). “Efficient Algorithms for Agglomerative Hierarchical Clustering Methods.” Journal of Classification, 1(1), 7–24. ISSN 1432-1343. 10.1007/BF01890115.
  9. “Beyond Traditional Sleep Scoring: Massive Feature Extraction and Data-Driven Clustering of Sleep Time Series.” Sleep Medicine, 98, 39–52. ISSN 1389-9457. 10.1016/j.sleep.2022.06.013.
  10. Facebook Infrastructure Data Science (2021). “Kats.” URL https://facebookresearch.github.io/Kats/.
  11. Fulcher BD (2018). “Feature-Based Time-Series Analysis.” In Feature Engineering for Machine Learning and Data Analytics. CRC Press. ISBN 978-1-315-18108-0.
  12. “Highly Comparative Fetal Heart Rate Analysis.” In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3135–3138. ISSN 1558-4615. 10.1109/EMBC.2012.6346629.
  13. Fulcher BD, Jones NS (2014). “Highly Comparative Feature-Based Time-Series Classification.” IEEE Transactions on Knowledge and Data Engineering, 26(12), 3026–3037. ISSN 1041-4347, 1558-2191, 2326-3865. 10.1109/TKDE.2014.2316504. 1401.3531.
  14. Fulcher BD, Jones NS (2017). ‘‘Hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction.” Cell Systems, 5(5), 527–531.e3. ISSN 2405-4712. 10.1016/j.cels.2017.10.001.
  15. “Highly Comparative Time-Series Analysis: The Empirical Structure of Time Series and Their Methods.” Journal of The Royal Society Interface, 10(83), 20130048. 10.1098/rsif.2013.0048.
  16. “A Self-Organizing, Living Library of Time-Series Data.” Scientific Data, 7(1), 213. ISSN 2052-4463. 10.1038/s41597-020-0553-0.
  17. Harris BJ (2021). Catch22.jl. https://doi.org/10.5281/zenodo.5030712. V0.2.1.
  18. Henderson T (2021). Rcatch22: Calculation of 22 CAnonical Time-Series CHaracteristics. R package version 0.1.12.
  19. Henderson T (2022). “hendersontrent/theft-webtool: v0.1.1.” 10.5281/ZENODO.6656286. URL https://zenodo.org/record/6656286.
  20. Henderson T, Bryant AG (2022). “hendersontrent/theft: v0.3.9.7.” 10.5281/ZENODO.6650876. URL https://zenodo.org/record/6650876.
  21. Henderson T, Fulcher BD (2021). ‘‘An Empirical Evaluation of Time-Series Feature Sets.” In 2021 International Conference on Data Mining Workshops (ICDMW), pp. 1032–1038. ISSN 2375-9259. 10.1109/ICDMW53433.2021.00134.
  22. tsfeatures: Time Series Feature Extraction. R package version 1.0.2, URL https://CRAN.R-project.org/package=tsfeatures.
  23. Jolliffe IT (2002). Principal Component Analysis. Springer Series in Statistics. Springer-Verlag, New York. ISBN 978-0-387-95442-4. 10.1007/b98835.
  24. ‘‘Prediction of Remaining Time on Site for E-Commerce Users: A SOM and Long Short-Term Memory Study.” Journal of Forecasting, n/a(n/a). ISSN 1099-131X. 10.1002/for.2771.
  25. “Exploring Granger Causality between Global Average Observed Time Series of Carbon Dioxide and Temperature.” Theoretical and Applied Climatology, 104(3), 325–335. ISSN 1434-4483. 10.1007/s00704-010-0342-3.
  26. Kuhn M (2020). caret: Classification and Regression Training. R package version 6.0-86, URL https://CRAN.R-project.org/package=caret.
  27. ‘‘Sensor Faults Classification for SHM Systems Using Deep Learning-Based Method with Tsfresh Features.” Smart Materials and Structures, 29(7), 075005. ISSN 0964-1726. 10.1088/1361-665X/ab85a6.
  28. “Catch22: CAnonical Time-series CHaracteristics.” Data Mining and Knowledge Discovery, 33(6), 1821–1852. ISSN 1573-756X. 10.1007/s10618-019-00647-x.
  29. “Cortical Excitation:Inhibition Imbalance Causes Abnormal Brain Network Dynamics as Observed in Neurodevelopmental Disorders.” Cerebral Cortex, 30(9), 4922–4937. ISSN 1047-3211. 10.1093/cercor/bhaa084.
  30. “FFORMA: Feature-based Forecast Model Averaging.” International Journal of Forecasting, 36(1), 86–92. ISSN 0169-2070. 10.1016/j.ijforecast.2019.02.011.
  31. Ojala M, Garriga GC (2009). “Permutation Tests for Studying Classifier Performance.” In 2009 Ninth IEEE International Conference on Data Mining, pp. 908–913. IEEE, Miami Beach, FL, USA. ISBN 978-1-4244-5242-2. 10.1109/ICDM.2009.108.
  32. “Behavioral Discrimination and Time-Series Phenotyping of Birdsong Performance.” PLOS Computational Biology, 17(4), e1008820. ISSN 1553-7358. 10.1371/journal.pcbi.1008820.
  33. “A Survey of Dimensionality Reduction Techniques.” 10.48550/arXiv.1403.2877. 1403.2877.
  34. Subasi A, Ismail Gursoy M (2010). “EEG Signal Classification Using PCA, ICA, LDA and Support Vector Machines.” Expert Systems with Applications, 37(12), 8659–8666. ISSN 0957-4174. 10.1016/j.eswa.2010.06.065.
  35. “Time Series Extrinsic Regression.” Data Mining and Knowledge Discovery, 35(3), 1032–1060. ISSN 1573-756X. 10.1007/s10618-021-00745-9.
  36. “Tsflex: Flexible Time Series Processing & Feature Extraction.” SoftwareX, 17, 100971. ISSN 2352-7110. 10.1016/j.softx.2021.100971.
  37. van der Maaten L, Hinton G (2008). “Visualizing Data Using T-SNE.” Journal of Machine Learning Research, 9(86), 2579–2605. ISSN 1533-7928.
  38. “Evaluation and Comparison of EEG Traces: Latent Structure in Nonstationary Time Series.” Journal of the American Statistical Association, 94(446), 375–387. ISSN 0162-1459. 10.1080/01621459.1999.10474128.
  39. Wickham H (2014). “Tidy Data.” Journal of Statistical Software, 59(1), 1–23. ISSN 1548-7660. 10.18637/jss.v059.i10.
  40. “Welcome to the Tidyverse.” Journal of Open Source Software, 4(43), 1686. ISSN 2475-9066. 10.21105/joss.01686.
  41. ‘‘An Anomaly Detection Algorithm Selection Service for IoT Stream Data Based on Tsfresh Tool and Genetic Algorithm.” Security and Communication Networks, 2021, 6677027. ISSN 1939-0114. 10.1155/2021/6677027.
Citations (7)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.