Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors (2306.12041v2)
Abstract: We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level. The novelty of the proposed model is threefold. First, we introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects. Second, we integrate a teacher decoder and a student decoder into our architecture, leveraging the discrepancy between the outputs given by the two decoders to improve anomaly detection. Third, we generate synthetic abnormal events to augment the training videos, and task the masked AE model to jointly reconstruct the original frames (without anomalies) and the corresponding pixel-level anomaly maps. Our design leads to an efficient and effective model, as demonstrated by the extensive experiments carried out on four benchmarks: Avenue, ShanghaiTech, UBnormal and UCSD Ped2. The empirical results show that our model achieves an excellent trade-off between speed and accuracy, obtaining competitive AUC scores, while processing 1655 FPS. Hence, our model is between 8 and 70 times faster than competing methods. We also conduct an ablation study to justify our design. Our code is freely available at: https://github.com/ristea/aed-mae.
- UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection. In Proceedings of CVPR, pages 20143–20153, 2022.
- Robust Real-Time Unusual Event Detection Using Multiple Fixed-Location Monitors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3):555–560, 2008.
- Video parsing for abnormality detection. In Proceedings of ICCV, pages 2415–2422, 2011.
- Learning not to reconstruct anomalies. In Proceedings of BMVC, 2021a.
- Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection. In Proceedings of ICCVW, pages 207–214, 2021b.
- Do deep nets really need to be deep? In Proceedings of NIPS, pages 2654–2662, 2014.
- MultiMAE: Multi-modal multi-task masked autoencoders. In Proceedings of ECCV, pages 348–367. Springer, 2022.
- Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings. In Proceedings of CVPR, pages 4183–4192, 2020.
- Is Space-Time Attention All You Need for Video Understanding? In Proceedings of ICML, 2021.
- SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection. Computer Vision and Image Understanding, 229:103656, 2023.
- SdAE: Self-distillated Masked Autoencoder. In Proceedings of ECCV, pages 108–124, 2022.
- Relation-based knowledge distillation for anomaly detection. In Proceedings of PRCV, pages 105–116, 2021.
- Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. In Proceedings of CVPR, pages 2909–2917, 2015.
- Sparse reconstruction cost for abnormal event detection. In Proceedings of CVPR, pages 3449–3456, 2011.
- A Discriminative Framework for Anomaly Detection in Large Videos. In Proceedings of ECCV, pages 334–349, 2016.
- Anomaly detection via reverse distillation from one-class embedding. In Proceedings of CVPR, pages 9737–9746, 2022.
- Dual Discriminator Generative Adversarial Network for Video Anomaly Detection. IEEE Access, 8:88170–88176, 2020.
- Any-Shot Sequential Anomaly Detection in Surveillance Videos. In Proceedings of CVPRW, pages 934–935, 2020a.
- Continual Learning for Anomaly Detection in Surveillance Videos. In Proceedings of CVPRW, pages 254–255, 2020b.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of ICLR, 2021.
- Online Detection of Abnormal Events Using Incremental Coding Length. In Proceedings of AAAI, pages 3755–3761, 2015.
- Video anomaly detection and localization via Gaussian Mixture Fully Convolutional Variational Autoencoder. Computer Vision and Image Understanding, 195:102920, 2020.
- Attribute Restoration Framework for Anomaly Detection. IEEE Transactions on Multimedia, 24:116–127, 2022.
- Masked Autoencoders As Spatiotemporal Learners. In Proceedings of NeurIPS, 2022.
- Learning deep event models for crowd anomaly detection. Neurocomputing, 219:548–556, 2017.
- Anomaly Detection in Video via Self-Supervised and Multi-Task Learning. In Proceedings of CVPR, pages 12742–12752, 2021.
- A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4505–4523, 2022.
- Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. In Proceedings of ICCV, pages 1705–1714, 2019.
- Learning temporal regularity in video sequences. In Proceedings of CVPR, pages 733–742, 2016.
- Masked Autoencoders Are Scalable Vision Learners. In Proceedings of CVPR, pages 16000–16009, 2022.
- Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge. In Proceedings of ICCV, pages 3639–3647, 2017.
- Distilling the Knowledge in a Neural Network. In Proceedings of NIPS Deep Learning and Representation Learning Workshop, 2014.
- Normalizing flows for human pose anomaly detection. In Proceedings of ICCV, pages 13545–13554, 2023.
- Self-Supervised Masking for Unsupervised Anomaly Detection and Localization. IEEE Transactions on Multimedia, 2022.
- Unmasking the abnormal events in video. In Proceedings of ICCV, pages 2895–2903, 2017.
- Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video. In Proceedings of CVPR, pages 7842–7851, 2019a.
- Detecting abnormal events in video using Narrowed Normality Clusters. In Proceedings of WACV, pages 1951–1960, 2019b.
- TAM-Net: Temporal Enhanced Appearance-to-Motion Generative Network for Video Anomaly Detection. In Proceedings of IJCNN, pages 1–8, 2020.
- Masked Swin Transformer Unet for Industrial Anomaly Detection. IEEE Transactions on Industrial Informatics, 19(2):2200–2209, 2022.
- BiPOCO: Bi-Directional Trajectory Prediction with Pose Constraints for Pedestrian Anomaly Detection. In Proceedings of SL4AD, 2022.
- Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates. In Proceedings of CVPR, pages 2921–2928, 2009.
- Adam: A method for stochastic optimization. In Proceedings of ICLR, 2015.
- BMAN: Bidirectional Multi-Scale Aggregation Networks for Abnormal Event Detection. IEEE Transactions on Image Processing, 29:2395–2408, 2019.
- Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection. In Proceedings of ECCV, pages 333–350, 2022.
- Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1):18–32, 2014.
- Superpixel Masking and Inpainting for Self-Supervised Anomaly Detection. In Proceedings of BMVC, 2020.
- Future Frame Prediction for Anomaly Detection – A New Baseline. In Proceedings of CVPR, pages 6536–6545, 2018a.
- Classifier Two-Sample Test for Video Anomaly Detections. In Proceedings of BMVC, 2018b.
- A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction. In Proceedings of ICCV, pages 13588–13597, 2021.
- Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping. In Proceedings of CVPR, pages 24500–24510, 2023.
- Abnormal Event Detection at 150 FPS in MATLAB. In Proceedings of ICCV, pages 2720–2727, 2013.
- Few-Shot Scene-Adaptive Anomaly Detection. In Proceedings of ECCV, pages 125–141, 2020.
- A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework. In Proceedings of ICCV, pages 341–349, 2017.
- Self-supervised masked convolutional transformer block for anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):525–542, 2024.
- Anomaly Detection in Crowded Scenes. In Proceedings of CVPR, pages 1975–1981, 2010.
- Abnormal crowd behavior detection using social force model. In Proceedings of CVPR, pages 935–942, 2009.
- Anomaly Detection in Video Sequence With Appearance-Motion Correspondence. In Proceedings of ICCV, pages 1273–1283, 2019.
- Self-trained Deep Ordinal Regression for End-to-End Video Anomaly Detection. In Proceedings of CVPR, pages 12173–12182, 2020.
- Deep learning for anomaly detection: A review. ACM Computing Surveys, 54(2):1–38, 2021.
- FastAno: Fast anomaly detection via spatio-temporal patch transformation. In Proceedings of WACV, pages 2249–2259, 2022.
- Learning Memory-guided Normality for Anomaly Detection. In Proceedings of CVPR, pages 14372–14381, 2020.
- Street Scene: A new dataset and evaluation protocol for video anomaly detection. In Proceedings of WACV, pages 2569–2578, 2020.
- Learning a distance function with a Siamese network to localize anomalies in videos. In Proceedings of WACV, pages 2598–2607, 2020.
- Perceptual metric learning for video anomaly detection. Machine Vision and Applications, 32:1432–1769, 2021.
- A Survey of Single-Scene Video Anomaly Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5):2293–2312, 2022.
- Abnormal Event Detection in Videos using Generative Adversarial Nets. In Proceedings of ICIP, pages 1577–1581, 2017.
- Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection. In Proceedings of WACV, pages 1689–1698, 2018.
- Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection. In Proceedings of BMVC, pages 28.1–28.13, 2015.
- Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection. In Proceedings of CVPR, pages 13576–13586, 2022.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
- Deep-Cascade: Cascading 3D Deep Neural Networks for Fast Anomaly Detection and Localization in Crowded Scenes. IEEE Transactions on Image Processing, 26(4):1992–2004, 2017.
- Deep-Anomaly: Fully Convolutional Neural Network for Fast Anomaly Detection in Crowded Scenes. Computer Vision and Image Understanding, 172:88–97, 2018.
- Object-Centric Anomaly Detection by Attribute-Based Reasoning. In Proceedings of CVPR, pages 787–794, 2013.
- Multiresolution Knowledge Distillation for Anomaly Detection. In Proceedings of CVPR, pages 14902–14912, 2021.
- Video anomaly detection based on local statistical aggregates. In Proceedings of CVPR, pages 2112–2119, 2012.
- Deep Appearance Features for Abnormal Behavior Detection in Video. In Proceedings of ICIAP, pages 779–789, 2017.
- Real-World Anomaly Detection in Surveillance Videos. In Proceedings of CVPR, pages 6479–6488, 2018.
- Scene-Aware Context Reasoning for Unsupervised Abnormal Event Detection in Videos. In Proceedings of ACMMM, pages 184–192, 2020.
- Online growing neural gas for anomaly detection in changing surveillance scenes. Pattern Recognition, 64(C):187–201, 2017.
- Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, 129:123–130, 2020.
- Anomaly Detection using a Convolutional Winner-Take-All Autoencoder. In Proceedings of BMVC, 2017.
- Exploring diffusion models for unsupervised video anomaly detection. In Proceedings of ICIP, pages 2540–2544, 2023.
- Robust Anomaly Detection in Videos Using Multilevel Representations. In Proceedings of AAAI, pages 5216–5223, 2019.
- Video anomaly detection by solving decoupled spatio-temporal jigsaw puzzles. In Proceedings of ECCV, pages 494–511, 2022.
- Abnormal Event Detection in Videos Using Hybrid Spatio-Temporal Autoencoder. In Proceedings of ICIP, pages 2276–2280, 2018.
- Self-trained video anomaly detection based on teacher-student model. In Proceedings of MLSP, pages 1–6, 2021.
- Cluster Attention Contrast for Video Anomaly Detection. In Proceedings of ACMMM, pages 2463–2471, 2020.
- CvT: Introducing Convolutions to Vision Transformers. In Proceedings of ICCV, pages 22–31, 2021.
- Self-supervised sparse representation for video anomaly detection. In Proceedings of ECCV, pages 729–745, 2022.
- A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes. IEEE Transactions on Neural Networks and Learning Systems, 31(7):2609–2622, 2019.
- Chaotic Invariants of Lagrangian Particle Trajectories for Anomaly Detection in Crowded Scenes. In Proceedings of CVPR, pages 2054–2060, 2010.
- Detecting Anomalous Events in Videos by Learning Deep Representations of Appearance and Motion. Computer Vision and Image Understanding, 156:117–127, 2017.
- Feature Prediction Diffusion Model for Video Anomaly Detection. In Proceedings of ICCV, pages 5527–5537, 2023.
- Self-supervised video representation learning with motion-aware masked autoencoders. arXiv preprint arXiv:2210.04154, 2022.
- Cloze Test Helps: Effective Video Anomaly Detection via Learning to Complete Video Events. In Proceedings of ACMMM, pages 583–591, 2020.
- Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems, 33(8):3572–3586, 2022a.
- Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling. In Proceedings of CVPR, pages 19313–19322, 2022b.
- Old is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm. In Proceedings of CVPR, pages 14183–14193, 2020a.
- CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection. In Proceedings of ECCV, pages 358–376, 2020b.
- Generative Cooperative Learning for Unsupervised Video Anomaly Detection. In Proceedings of CVPR, pages 14744–14754, 2022.
- Self-Distillation: Towards Efficient and Compact Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4388–4403, 2022.
- Video Anomaly Detection and Localization using Motion-field Shape Description and Homogeneity Testing. Pattern Recognition, page 107394, 2020.
- Video anomaly detection based on locality sensitive hashing filters. Pattern Recognition, 59:302–311, 2016.
- Online Detection of Unusual Events in Videos via Dynamic Sparse Coding. In Proceedings of CVPR, pages 3313–3320, 2011.
- Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection. In Proceedings of CVPR, pages 1237–1246, 2019.