Temporal Action Localization for Inertial-based Human Activity Recognition (2311.15831v2)
Abstract: As of today, state-of-the-art activity recognition from wearable sensors relies on algorithms being trained to classify fixed windows of data. In contrast, video-based Human Activity Recognition, known as Temporal Action Localization (TAL), has followed a segment-based prediction approach, localizing activity segments in a timeline of arbitrary length. This paper is the first to systematically demonstrate the applicability of state-of-the-art TAL models for both offline and near-online Human Activity Recognition (HAR) using raw inertial data as well as pre-extracted latent features as input. Offline prediction results show that TAL models are able to outperform popular inertial models on a multitude of HAR benchmark datasets, with improvements reaching as much as 26% in F1-score. We show that by analyzing timelines as a whole, TAL models can produce more coherent segments and achieve higher NULL-class accuracy across all datasets. We demonstrate that TAL is less suited for the immediate classification of small-sized windows of data, yet offers an interesting perspective on inertial-based HAR -- alleviating the need for fixed-size windows and enabling algorithms to recognize activities of arbitrary length. With design choices and training concepts yet to be explored, we argue that TAL architectures could be of significant value to the inertial-based HAR community. The code and data download to reproduce experiments is publicly available via github.com/mariusbock/tal_for_har.
- Attend and discriminate: Beyond the state-of-the-art for human activity recognition using wearable sensors. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1 (2021), 1–22. https://doi.org/10.1145/3448083
- Diagnosing error in temporal action detectors. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-030-01219-9_16
- Boundary content graph neural network for temporal action proposal generation. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-030-58604-1_8
- Improving Deep Learning for HAR with Shallow LSTMs. In ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3460421.3480419
- WEAR: An outdoor sports dataset for wearable and egocentric activity recognition. CoRR abs/2304.05088 (2023). https://arxiv.org/abs/2304.05088
- Soft-NMS – improving object detection with one line of code. In IEEE International Conference on Computer Vision. https://doi.org/10.1109/iccv.2017.593
- A tutorial on human activity recognition using body-worn inertial sensors. Comput. Surveys 46, 3 (2014), 1–33. https://doi.org/10.1145/2499621
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2017.502
- The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognition Letters 34, 15 (2013). https://doi.org/10.1016/j.patrec.2012.12.014
- DCAN: Improving Temporal Action Detection via Dual Context Aggregation. In AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v36i1.19900
- HMGAN: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 3 (2023). https://doi.org/10.1145/3610909
- SALIENCE: An unsupervised user adaptation model for multiple wearable sensors based human activity recognition. IEEE Transactions on Mobile Computing 22, 9 (2023). https://doi.org/10.1109/TMC.2022.3171312
- METIER: A deep multi-task learning based activity and user recognition model using wearable sensors. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020). https://doi.org/10.1145/3381012
- Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100. International Journal of Computer Vision 130 (2022). https://doi.org/10.1007/s11263-021-01531-2
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In International Conference on Artificial Intelligence and Statistics. http://proceedings.mlr.press/v9/glorot10a
- Scale matters: Temporal scale aggregation network for precise action localization in untrimmed videos. In IEEE International Conference on Multimedia and Expo. https://doi.org/10.1109/ICME46284.2020.9102850
- Ego4D: Around the world in 3,000 hours of egocentric video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR52688.2022.01842
- Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1–28. https://doi.org/10.1145/3090076
- ActivityNet: A large-scale video benchmark for human activity understanding. In IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2015.7298698
- Hang-time HAR: A benchmark dataset for basketball activity recognition using wrist-worn inertial sensors. Sensors 23, 13 (2023). https://doi.org/10.3390/s23135879
- THUMOS challenge: Action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/
- Visualizing and Understanding Recurrent Networks. CoRR abs/1506.02078 (2015). http://arxiv.org/abs/1506.02078
- Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2020/file/f0bda020d2470f2e74990a07a607ebd9-Paper.pdf
- Fast learning of temporal action proposal via dense boundary generator. In IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1609/aaai.v34i07.6815
- Learning salient boundary feature for anchor-free temporal action localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr46437.2021.00333
- BMN: Boundary-matching network for temporal action proposal generation. In {IEEE}/{CVF} International Conference on Computer Vision. https://doi.org/10.1109/iccv.2019.00399
- Qinying Liu and Zilei Wang. 2020. Progressive boundary refinement network for temporal action detection. In AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v34i07.6829
- GIobalFusion: A global attentional deep learning framework for multisensor information fusion. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020). https://doi.org/10.1145/3380999
- Multi-shot temporal event localization: A benchmark. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr46437.2021.01241
- End-to-end temporal action detection with transformer. IEEE Transactions on Image Processing 31 (2022). https://doi.org/10.1109/TIP.2022.3195321
- Gaussian temporal awareness networks for action localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2019.00043
- Towards a dynamic inter-sensor correlations learning framework for multi-sensor-based wearable human activity recognition. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022). https://doi.org/10.1145/3550331
- Vishvak S. Murahari and Thomas Plötz. 2018. On attention models for human activity recognition. In ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3267242.3267287
- Proposal-free temporal action detection via global segmentation mask learning. In European Conferencee on Computer Vision. https://doi.org/10.1007/978-3-031-20062-5_37
- Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 16, 1 (2016). https://doi.org/10.3390/s16010115
- Lloyd Pellatt and Daniel Roggen. 2020. CausalBatch: Solving Complexity/Performance tradeoffs for deep convolutional and LSTM networks for wearable activity recognition. In ACM International Joint Conference on Pervasive and Ubiquitous Computing and ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3410530.3414365
- AROMA: A deep multi-task learning based simple and complex human activity recognition method using wearable sensors. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018). https://doi.org/10.1145/3214277
- Temporal context aggregation network for temporal action proposal refinement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr46437.2021.00055
- Transition-Aware Human Activity Recognition Using Smartphones. Neurocomputing 171 (2016). https://doi.org/10.1016/j.neucom.2015.07.085
- Generalized intersection over union: A metric and a loss for bounding box regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2019.00075
- Collecting Complex Activity Datasets in Highly Rich Networked Sensor Environments. In IEEE Seventh International Conference on Networked Sensing Systems. https://doi.org/10.1109/INSS.2010.5573462
- Wearables in the Wet Lab: A Laboratory System for Capturing and Guiding Experiments. In ACM International Joint Conference on Pervasive and Ubiquitous Computing. https://doi.org/10.1145/2750858.2807547
- TriDet: Temporal action detection with relative boundary modeling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr52729.2023.01808
- ReAct: Temporal action detection with relational queries. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-031-20080-9_7
- Class semantics-based attention for action detection. In IEEE/ CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01348
- Timo Sztyler and Heiner Stuckenschmidt. 2016. On-body localization of wearable devices: An investigation of position-aware activity recognition. In IEEE International Conference on Pervasive Computing and Communications. https://doi.org/10.1109/PERCOM.2016.7456521
- Relaxed transformer decoders for direct action proposal generation. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01327
- TemporalMaxer: Maximize temporal context with only max pooling for temporal action localization. CoRR abs/2303.09055 (2023). https://arxiv.org/abs/2303.09055
- Fully convolutional one-stage 3D object detection on LiDAR range images. In Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2022/file/e1f418450107c4a0ddc16d008d131573-Paper-Conference.pdf
- Context recognition in-the-wild: Unified model for multi-modal sensors and multi-label classification. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018). https://doi.org/10.1145/3161192
- Attention is all you need. In Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- Performance metrics for activity recognition. ACM Transactions on Intelligent Systems and Technology 2, 1 (2011). https://doi.org/10.1145/1889681.1889687
- Evaluating Performance in Continuous Context Recognition Using Event-Driven Error Characterisation. In Location- and Context-Awareness. https://doi.org/10.1007/11752967_16
- A Discriminative Feature Learning Approach for Deep Face Recognition. In European Conference on Computer Vision. 499–515. https://doi.org/10.1007/978-3-319-46478-7_31
- Christoph Wieland and Victor Pankratius. 2023. TinyGraphHAR: Enhancing human activity recognition with graph neural networks. In IEEE World AI IoT Congress. https://doi.org/10.1109/AIIoT58121.2023.10174597
- Deep Dilated Convolution on Multimodality Time Series for Human Activity Recognition. In IEEE International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2018.8489540
- LSTM-CNN architecture for human activity recognition. IEEE Access 8 (2020). https://doi.org/10.1109/ACCESS.2020.2982225
- InnoHAR: A deep neural network for complex human activity recognition. IEEE Access 7 (2019). https://doi.org/10.1109/ACCESS.2018.2890675
- G-TAD: Sub-graph localization for temporal action detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr42600.2020.01017
- BasicTAD: an astounding rgb-only baseline for temporal action detection. Computer Vision and Image Understanding 232 (2023). https://doi.org/10.1016/j.cviu.2023.103692
- Graph convolutional networks for temporal action localization. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv.2019.00719
- ActionFormer: Localizing moments of actions with transformers. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-19772-7_29
- Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr42600.2020.00978
- IF-ConvTransformer: A framework for human activity recognition using IMU fusion and ConvTransformer. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022). https://doi.org/10.1145/3534584
- Video self-stitching graph network for temporal action localization. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01340
- Bottom-up temporal action localization with mutual regularization. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-030-58598-3_32
- TinyHAR: A lightweight deep learning model designed for human activity recognition. In ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3544794.3558467
- Enriching local and global contexts for temporal action localization. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01326