Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 214 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System (2207.07827v5)

Published 16 Jul 2022 in cs.LG and cs.CV

Abstract: Long-term time-series forecasting (LTSF) is fundamental to various real-world applications, where Transformer-based models have become the dominant framework due to their ability to capture long-range dependencies. However, these models often experience overfitting due to data redundancy in rolling forecasting settings, limiting their generalization ability particularly evident in longer sequences with highly similar adjacent data. In this work, we introduce CLMFormer, a novel framework that mitigates redundancy through curriculum learning and a memory-driven decoder. Specifically, we progressively introduce Bernoulli noise to the training samples, which effectively breaks the high similarity between adjacent data points. This curriculum-driven noise introduction aids the memory-driven decoder by supplying more diverse and representative training data, enhancing the decoder's ability to model seasonal tendencies and dependencies in the time-series data. To further enhance forecasting accuracy, we introduce a memory-driven decoder. This component enables the model to capture seasonal tendencies and dependencies in the time-series data and leverages temporal relationships to facilitate the forecasting process. Extensive experiments on six real-world LTSF benchmarks show that CLMFormer consistently improves Transformer-based models by up to 30%, demonstrating its effectiveness in long-horizon forecasting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Y. Tang, Z. Song, Y. Zhu, H. Yuan, M. Hou, J. Ji, C. Tang, and J. Li, “A survey on machine learning models for financial time series forecasting,” Neurocomputing, vol. 512, pp. 363–380, 2022.
  2. K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, “Accurate medium-range global weather forecasting with 3d neural networks,” Nature, pp. 1–6, 2023.
  3. G. Raman, B. Ashraf, Y. K. Demir, C. D. Kershaw, S. Cheruku, M. Atis, A. Atis, M. Atar, W. Chen, I. Ibrahim, T. Bat, and M. Mete, “Machine learning prediction for COVID-19 disease severity at hospital admission,” BMC Medical Informatics Decis. Mak., vol. 23, no. 1, p. 46, 2023.
  4. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
  5. M. Li, P.-Y. Huang, X. Chang, J. Hu, Y. Yang, and A. Hauptmann, “Video pivoting unsupervised multi-modal machine translation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  6. X. Lin, S. Sun, W. Huang, B. Sheng, P. Li, and D. D. Feng, “Eapt: efficient attention pyramid transformer for image processing,” IEEE Transactions on Multimedia, 2021.
  7. A. Yang, S. Lin, C.-H. Yeh, M. Shu, Y. Yang, and X. Chang, “Context matters: Distilling knowledge graph for enhanced object detection,” IEEE Transactions on Multimedia, 2023.
  8. J. Liu, W. Wang, S. Chen, X. Zhu, and J. Liu, “Sounding video generator: A unified framework for text-guided sounding video generation,” IEEE Transactions on Multimedia, 2023.
  9. Y. Su, J. Deng, R. Sun, G. Lin, H. Su, and Q. Wu, “A unified transformer framework for group-based segmentation: Co-segmentation, co-saliency detection and video salient object detection,” IEEE Transactions on Multimedia, 2023.
  10. M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learning for chest x-ray report generation,” arXiv preprint arXiv:2303.10323, 2023.
  11. M. Li, R. Liu, F. Wang, X. Chang, and X. Liang, “Auxiliary signal-guided knowledge encoder-decoder for medical report generation,” World Wide Web, vol. 26, no. 1, pp. 253–270, 2023.
  12. H. Cao, Z. Huang, T. Yao, J. Wang, H. He, and Y. Wang, “Inparformer: Evolutionary decomposition transformers with interactive parallel attention for long-term time series forecasting,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 6, 2023, pp. 6906–6915.
  13. Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023.   OpenReview.net, 2023. [Online]. Available: https://openreview.net/pdf?id=Jbdc0vTOcol
  14. T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting,” in International Conference on Machine Learning.   PMLR, 2022, pp. 27 268–27 286.
  15. H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in AAAI, 2021.
  16. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 1997.
  17. Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell, “A dual-stage attention-based recurrent neural network for time series prediction,” in IJCAI, 2017.
  18. R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka, “A multi-horizon quantile recurrent forecaster,” arXiv:1711.11053, 2018.
  19. H. Wu, J. Xu, J. Wang, and M. Long, “Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting,” Advances in Neural Information Processing Systems, vol. 34, pp. 22 419–22 430, 2021.
  20. J. Wang, S. Qian, J. Hu, and R. Hong, “Positive unlabeled fake news detection via multi-modal masked transformer network,” IEEE Transactions on Multimedia, 2023.
  21. Y. Du, M. Wang, W. Zhou, and H. Li, “Progressive similarity preservation learning for deep scalable product quantization,” IEEE Transactions on Multimedia, 2023.
  22. J. Pan, S. Yang, L. G. Foo, Q. Ke, H. Rahmani, Z. Fan, and J. Liu, “Progressive channel-shrinking network,” IEEE Transactions on Multimedia, 2023.
  23. K. Benidis, S. S. Rangapuram, V. Flunkert, Y. Wang, D. C. Maddix, A. C. Türkmen, J. Gasthaus, M. Bohlke-Schneider, D. Salinas, L. Stella, F. Aubet, L. Callot, and T. Januschowski, “Deep learning for time series forecasting: Tutorial and literature survey,” ACM Comput. Surv., vol. 55, no. 6, pp. 121:1–121:36, 2023.
  24. G. E. Box and G. M. Jenkins, “Some recent advances in forecasting and control,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 17, no. 2, pp. 91–109, 1968.
  25. E. S. Gardner Jr, “Exponential smoothing: The state of the art,” Journal of forecasting, vol. 4, no. 1, pp. 1–28, 1985.
  26. R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning, “Stl: A seasonal-trend decomposition,” J. Off. Stat, vol. 6, no. 1, pp. 3–73, 1990.
  27. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint arXiv:1409.1259, 2014.
  28. D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, “Deepar: Probabilistic forecasting with autoregressive recurrent networks,” International Journal of Forecasting, vol. 36, no. 3, pp. 1181–1191, 2020.
  29. S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,” Advances in neural information processing systems, vol. 28, 2015.
  30. Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, “Transformers in time series: A survey,” arXiv preprint arXiv:2202.07125, 2022.
  31. M. Jin, G. Shi, Y.-F. Li, Q. Wen, B. Xiong, T. Zhou, and S. Pan, “How expressive are spectral-temporal graph neural networks for time series forecasting?” arXiv preprint arXiv:2305.06587, 2023.
  32. S. Wu, X. Xiao, Q. Ding, P. Zhao, Y. Wei, and J. Huang, “Adversarial sparse transformer for time series forecasting,” in NeurIPS, 2020.
  33. A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are transformers effective for time series forecasting?” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 9, 2023, pp. 11 121–11 128.
  34. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in NAACL, 2018.
  35. C. M. Bishop, “Training with noise is equivalent to tikhonov regularization,” Neural Computation, vol. 7, pp. 108–116, 1995.
  36. S. Wager, S. I. Wang, and P. Liang, “Dropout training as adaptive regularization,” in NeurIPS, 2013.
  37. S. Zhai and Z. Zhang, “Dropout training of matrix factorization and autoencoder for link prediction in sparse graphs,” in SDM, 2015.
  38. P. Morerio, J. Cavazza, R. Volpi, R. Vidal, and V. Murino, “Curriculum dropout,” in ICCV, 2017.
  39. C. Ma, C. Shen, A. R. Dick, Q. Wu, P. Wang, A. van den Hengel, and I. D. Reid, “Visual question answering with memory-augmented networks,” in CVPR, 2018.
  40. C. Ma, L. Ma, Y. Zhang, J. Sun, X. Liu, and M. Coates, “Memory augmented graph neural networks for sequential recommendation,” in AAAI, 2020.
  41. Z. Fei, “Memory-augmented image captioning,” in AAAI, 2021.
  42. D. Xu, W. Cheng, B. Zong, D. Song, J. Ni, W. Yu, Y. Liu, H. Chen, and X. Zhang, “Tensorized lstm with adaptive shared memory for learning trends in multivariate time series,” in AAAI, 2020.
  43. M. Jiang, J. Wu, X. Shi, and M. Zhang, “Transformer based memory network for sentiment analysis of web comments,” IEEE Access, vol. 7, pp. 179 942–179 953, 2019.
  44. A. Banino, A. P. Badia, R. Köster, M. J. Chadwick, V. F. Zambaldi, D. Hassabis, C. Barry, M. Botvinick, D. Kumaran, and C. Blundell, “MEMO: A deep network for flexible combination of episodic memories,” in ICLR, 2020.
  45. Z. Chen, Y. Song, T.-H. Chang, and X. Wan, “Generating radiology reports via memory-driven transformer,” in EMNLP, 2020.
  46. X. Ma, Y. Wang, M. J. Dousti, P. Koehn, and J. Pino, “Streaming simultaneous speech translation with augmented memory transformer,” in ICASSP, 2021.
  47. T. Zhou, Z. Ma, X. Wang, Q. Wen, L. Sun, T. Yao, W. Yin, and R. Jin, “Film: Frequency improved legendre memory model for long-term time series forecasting,” in NeurIPS, 2022.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.