Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal Action Localization for Inertial-based Human Activity Recognition (2311.15831v2)

Published 27 Nov 2023 in cs.LG, cs.HC, and eess.SP

Abstract: As of today, state-of-the-art activity recognition from wearable sensors relies on algorithms being trained to classify fixed windows of data. In contrast, video-based Human Activity Recognition, known as Temporal Action Localization (TAL), has followed a segment-based prediction approach, localizing activity segments in a timeline of arbitrary length. This paper is the first to systematically demonstrate the applicability of state-of-the-art TAL models for both offline and near-online Human Activity Recognition (HAR) using raw inertial data as well as pre-extracted latent features as input. Offline prediction results show that TAL models are able to outperform popular inertial models on a multitude of HAR benchmark datasets, with improvements reaching as much as 26% in F1-score. We show that by analyzing timelines as a whole, TAL models can produce more coherent segments and achieve higher NULL-class accuracy across all datasets. We demonstrate that TAL is less suited for the immediate classification of small-sized windows of data, yet offers an interesting perspective on inertial-based HAR -- alleviating the need for fixed-size windows and enabling algorithms to recognize activities of arbitrary length. With design choices and training concepts yet to be explored, we argue that TAL architectures could be of significant value to the inertial-based HAR community. The code and data download to reproduce experiments is publicly available via github.com/mariusbock/tal_for_har.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Attend and discriminate: Beyond the state-of-the-art for human activity recognition using wearable sensors. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 1 (2021), 1–22. https://doi.org/10.1145/3448083
  2. Diagnosing error in temporal action detectors. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-030-01219-9_16
  3. Boundary content graph neural network for temporal action proposal generation. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-030-58604-1_8
  4. Improving Deep Learning for HAR with Shallow LSTMs. In ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3460421.3480419
  5. WEAR: An outdoor sports dataset for wearable and egocentric activity recognition. CoRR abs/2304.05088 (2023). https://arxiv.org/abs/2304.05088
  6. Soft-NMS – improving object detection with one line of code. In IEEE International Conference on Computer Vision. https://doi.org/10.1109/iccv.2017.593
  7. A tutorial on human activity recognition using body-worn inertial sensors. Comput. Surveys 46, 3 (2014), 1–33. https://doi.org/10.1145/2499621
  8. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2017.502
  9. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognition Letters 34, 15 (2013). https://doi.org/10.1016/j.patrec.2012.12.014
  10. DCAN: Improving Temporal Action Detection via Dual Context Aggregation. In AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v36i1.19900
  11. HMGAN: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 3 (2023). https://doi.org/10.1145/3610909
  12. SALIENCE: An unsupervised user adaptation model for multiple wearable sensors based human activity recognition. IEEE Transactions on Mobile Computing 22, 9 (2023). https://doi.org/10.1109/TMC.2022.3171312
  13. METIER: A deep multi-task learning based activity and user recognition model using wearable sensors. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020). https://doi.org/10.1145/3381012
  14. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC-KITCHENS-100. International Journal of Computer Vision 130 (2022). https://doi.org/10.1007/s11263-021-01531-2
  15. Xavier Glorot and Yoshua Bengio. 2010. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In International Conference on Artificial Intelligence and Statistics. http://proceedings.mlr.press/v9/glorot10a
  16. Scale matters: Temporal scale aggregation network for precise action localization in untrimmed videos. In IEEE International Conference on Multimedia and Expo. https://doi.org/10.1109/ICME46284.2020.9102850
  17. Ego4D: Around the world in 3,000 hours of egocentric video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR52688.2022.01842
  18. Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1–28. https://doi.org/10.1145/3090076
  19. ActivityNet: A large-scale video benchmark for human activity understanding. In IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2015.7298698
  20. Hang-time HAR: A benchmark dataset for basketball activity recognition using wrist-worn inertial sensors. Sensors 23, 13 (2023). https://doi.org/10.3390/s23135879
  21. THUMOS challenge: Action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/
  22. Visualizing and Understanding Recurrent Networks. CoRR abs/1506.02078 (2015). http://arxiv.org/abs/1506.02078
  23. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. In Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2020/file/f0bda020d2470f2e74990a07a607ebd9-Paper.pdf
  24. Fast learning of temporal action proposal via dense boundary generator. In IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1609/aaai.v34i07.6815
  25. Learning salient boundary feature for anchor-free temporal action localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr46437.2021.00333
  26. BMN: Boundary-matching network for temporal action proposal generation. In {IEEE}/{CVF} International Conference on Computer Vision. https://doi.org/10.1109/iccv.2019.00399
  27. Qinying Liu and Zilei Wang. 2020. Progressive boundary refinement network for temporal action detection. In AAAI Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v34i07.6829
  28. GIobalFusion: A global attentional deep learning framework for multisensor information fusion. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020). https://doi.org/10.1145/3380999
  29. Multi-shot temporal event localization: A benchmark. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr46437.2021.01241
  30. End-to-end temporal action detection with transformer. IEEE Transactions on Image Processing 31 (2022). https://doi.org/10.1109/TIP.2022.3195321
  31. Gaussian temporal awareness networks for action localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2019.00043
  32. Towards a dynamic inter-sensor correlations learning framework for multi-sensor-based wearable human activity recognition. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022). https://doi.org/10.1145/3550331
  33. Vishvak S. Murahari and Thomas Plötz. 2018. On attention models for human activity recognition. In ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3267242.3267287
  34. Proposal-free temporal action detection via global segmentation mask learning. In European Conferencee on Computer Vision. https://doi.org/10.1007/978-3-031-20062-5_37
  35. Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 16, 1 (2016). https://doi.org/10.3390/s16010115
  36. Lloyd Pellatt and Daniel Roggen. 2020. CausalBatch: Solving Complexity/Performance tradeoffs for deep convolutional and LSTM networks for wearable activity recognition. In ACM International Joint Conference on Pervasive and Ubiquitous Computing and ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3410530.3414365
  37. AROMA: A deep multi-task learning based simple and complex human activity recognition method using wearable sensors. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 2 (2018). https://doi.org/10.1145/3214277
  38. Temporal context aggregation network for temporal action proposal refinement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr46437.2021.00055
  39. Transition-Aware Human Activity Recognition Using Smartphones. Neurocomputing 171 (2016). https://doi.org/10.1016/j.neucom.2015.07.085
  40. Generalized intersection over union: A metric and a loss for bounding box regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2019.00075
  41. Collecting Complex Activity Datasets in Highly Rich Networked Sensor Environments. In IEEE Seventh International Conference on Networked Sensing Systems. https://doi.org/10.1109/INSS.2010.5573462
  42. Wearables in the Wet Lab: A Laboratory System for Capturing and Guiding Experiments. In ACM International Joint Conference on Pervasive and Ubiquitous Computing. https://doi.org/10.1145/2750858.2807547
  43. TriDet: Temporal action detection with relative boundary modeling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr52729.2023.01808
  44. ReAct: Temporal action detection with relational queries. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-031-20080-9_7
  45. Class semantics-based attention for action detection. In IEEE/ CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01348
  46. Timo Sztyler and Heiner Stuckenschmidt. 2016. On-body localization of wearable devices: An investigation of position-aware activity recognition. In IEEE International Conference on Pervasive Computing and Communications. https://doi.org/10.1109/PERCOM.2016.7456521
  47. Relaxed transformer decoders for direct action proposal generation. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01327
  48. TemporalMaxer: Maximize temporal context with only max pooling for temporal action localization. CoRR abs/2303.09055 (2023). https://arxiv.org/abs/2303.09055
  49. Fully convolutional one-stage 3D object detection on LiDAR range images. In Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2022/file/e1f418450107c4a0ddc16d008d131573-Paper-Conference.pdf
  50. Context recognition in-the-wild: Unified model for multi-modal sensors and multi-label classification. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018). https://doi.org/10.1145/3161192
  51. Attention is all you need. In Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  52. Performance metrics for activity recognition. ACM Transactions on Intelligent Systems and Technology 2, 1 (2011). https://doi.org/10.1145/1889681.1889687
  53. Evaluating Performance in Continuous Context Recognition Using Event-Driven Error Characterisation. In Location- and Context-Awareness. https://doi.org/10.1007/11752967_16
  54. A Discriminative Feature Learning Approach for Deep Face Recognition. In European Conference on Computer Vision. 499–515. https://doi.org/10.1007/978-3-319-46478-7_31
  55. Christoph Wieland and Victor Pankratius. 2023. TinyGraphHAR: Enhancing human activity recognition with graph neural networks. In IEEE World AI IoT Congress. https://doi.org/10.1109/AIIoT58121.2023.10174597
  56. Deep Dilated Convolution on Multimodality Time Series for Human Activity Recognition. In IEEE International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2018.8489540
  57. LSTM-CNN architecture for human activity recognition. IEEE Access 8 (2020). https://doi.org/10.1109/ACCESS.2020.2982225
  58. InnoHAR: A deep neural network for complex human activity recognition. IEEE Access 7 (2019). https://doi.org/10.1109/ACCESS.2018.2890675
  59. G-TAD: Sub-graph localization for temporal action detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr42600.2020.01017
  60. BasicTAD: an astounding rgb-only baseline for temporal action detection. Computer Vision and Image Understanding 232 (2023). https://doi.org/10.1016/j.cviu.2023.103692
  61. Graph convolutional networks for temporal action localization. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv.2019.00719
  62. ActionFormer: Localizing moments of actions with transformers. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-19772-7_29
  63. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr42600.2020.00978
  64. IF-ConvTransformer: A framework for human activity recognition using IMU fusion and ConvTransformer. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022). https://doi.org/10.1145/3534584
  65. Video self-stitching graph network for temporal action localization. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01340
  66. Bottom-up temporal action localization with mutual regularization. In European Conference on Computer Vision. https://doi.org/10.1007/978-3-030-58598-3_32
  67. TinyHAR: A lightweight deep learning model designed for human activity recognition. In ACM International Symposium on Wearable Computers. https://doi.org/10.1145/3544794.3558467
  68. Enriching local and global contexts for temporal action localization. In IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/iccv48922.2021.01326

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com