TREC: APT Tactic / Technique Recognition via Few-Shot Provenance Subgraph Learning (2402.15147v2)
Abstract: APT (Advanced Persistent Threat) with the characteristics of persistence, stealth, and diversity is one of the greatest threats against cyber-infrastructure. As a countermeasure, existing studies leverage provenance graphs to capture the complex relations between system entities in a host for effective APT detection. In addition to detecting single attack events as most existing work does, understanding the tactics / techniques (e.g., Kill-Chain, ATT&CK) applied to organize and accomplish the APT attack campaign is more important for security operations. Existing studies try to manually design a set of rules to map low-level system events to high-level APT tactics / techniques. However, the rule based methods are coarse-grained and lack generalization ability, thus they can only recognize APT tactics and cannot identify fine-grained APT techniques and mutant APT attacks. In this paper, we propose TREC, the first attempt to recognize APT tactics / techniques from provenance graphs by exploiting deep learning techniques. To address the "needle in a haystack" problem, TREC segments small and compact subgraphs covering individual APT technique instances from a large provenance graph based on a malicious node detection model and a subgraph sampling algorithm. To address the "training sample scarcity" problem, TREC trains the APT tactic / technique recognition model in a few-shot learning manner by adopting a Siamese neural network. We evaluate TREC based on a customized dataset collected and made public by our team. The experiment results show that TREC significantly outperforms state-of-the-art systems in APT tactic recognition and TREC can also effectively identify APT techniques.
- A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,” IEEE Communications Surveys & Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019.
- Z. Li, Q. A. Chen, R. Yang, Y. Chen, and W. Ruan, “Threat detection and investigation with system-level provenance graphs: a survey,” Computers & Security, vol. 106, p. 102282, 2021.
- M. Zipperle, F. Gottwalt, E. Chang, and T. Dillon, “Provenance-based intrusion detection systems: A survey,” ACM Computing Surveys, vol. 55, no. 7, pp. 1–36, 2022.
- T. Zhu, J. Wang, L. Ruan, C. Xiong, J. Yu, Y. Li, Y. Chen, M. Lv, and T. Chen, “General, efficient, and real-time data compaction strategy for apt forensic analysis,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3312–3325, 2021.
- T. Yadav and A. M. Rao, “Technical aspects of cyber kill chain,” in Security in Computing and Communications: Third International Symposium, SSCC 2015, Kochi, India, August 10-13, 2015. Proceedings 3. Springer, 2015, pp. 438–452.
- M. ATT&CK, “Mitre att&ck,” https://attack.mitre.org/, 2021.
- M. N. Hossain, S. M. Milajerdi, J. Wang, B. Eshete, R. Gjomemo, R. Sekar, S. Stoller, and V. Venkatakrishnan, “{{\{{SLEUTH}}\}}: Real-time attack scenario reconstruction from {{\{{COTS}}\}} audit data,” in 26th USENIX Security Symposium (USENIX Security 17), 2017, pp. 487–504.
- S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V. Venkatakrishnan, “Holmes: real-time apt detection through correlation of suspicious information flows,” in 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 2019, pp. 1137–1152.
- C. Xiong, T. Zhu, W. Dong, L. Ruan, R. Yang, Y. Cheng, Y. Chen, S. Cheng, and X. Chen, “Conan: A practical real-time apt detection system with high accuracy and efficiency,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 1, pp. 551–565, 2020.
- M. Barre, A. Gehani, and V. Yegneswaran, “Mining data provenance to detect advanced persistent threats,” in 11th International Workshop on Theory and Practice of Provenance (TaPP 2019), 2019.
- A. Bates, D. J. Tian, K. R. Butler, and T. Moyer, “Trustworthy {{\{{Whole-System}}\}} provenance for the linux kernel,” in 24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 319–334.
- A. Gehani and D. Tariq, “Spade: Support for provenance auditing in distributed environments,” in ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Springer, 2012, pp. 101–120.
- S. Wang, Z. Wang, T. Zhou, H. Sun, X. Yin, D. Han, H. Zhang, X. Shi, and J. Yang, “Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3972–3987, 2022.
- T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks.” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” CoRR, vol. abs/1710.10903, 2017.
- K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 eighth ieee international conference on data mining. IEEE, 2008, pp. 413–422.
- X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, L. Liu, R. W. White, A. Mantrach, F. Silvestri, J. J. McAuley, R. Baeza-Yates, and L. Zia, Eds. ACM, 2019, pp. 2022–2032.
- Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable representation learning for heterogeneous networks,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 135–144.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- D. Chicco, “Siamese neural networks: An overview,” Artificial neural networks, pp. 73–94, 2021.
- S. Harikumar and P. Surya, “K-medoid clustering for heterogeneous datasets,” Procedia Computer Science, vol. 70, pp. 226–237, 2015.
- T. Chen, C. Dong, M. Lv, Q. Song, H. Liu, T. Zhu, K. Xu, L. Chen, S. Ji, and Y. Fan, “Apt-kgl: An intelligent apt detection system based on threat knowledge and heterogeneous provenance graph learning,” IEEE Transactions on Dependable and Secure Computing, 2022.
- W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “Nodoze: Combatting threat alert fatigue with automated provenance triage,” in network and distributed systems security symposium, 2019.
- P. Fang, P. Gao, C. Liu, E. Ayday, K. Jee, T. Wang, Y. F. Ye, Z. Liu, and X. Xiao, “{{\{{Back-Propagating}}\}} system dependency impact for attack investigation,” in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 2461–2478.
- Y. Liu, M. Zhang, D. Li, K. Jee, Z. Li, Z. Wu, J. Rhee, and P. Mittal, “Towards a timely causality analysis for enterprise security.” in NDSS, 2018.
- M. N. Hossain, S. Sheikhi, and R. Sekar, “Combating dependence explosion in forensic analysis using alternative tag propagation semantics,” in 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020, pp. 1139–1155.
- J. Zengy, X. Wang, J. Liu, Y. Chen, Z. Liang, T.-S. Chua, and Z. L. Chua, “Shadewatcher: Recommendation-guided cyber threat analysis using system audit records,” in 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 489–506.
- Q. Wang, W. U. Hassan, D. Li, K. Jee, X. Yu, K. Zou, J. Rhee, Z. Chen, W. Cheng, C. A. Gunter et al., “You are what you do: Hunting stealthy malware via data provenance analysis.” in NDSS, 2020.
- A. Alsaheel, Y. Nan, S. Ma, L. Yu, G. Walkup, Z. B. Celik, X. Zhang, and D. Xu, “{{\{{ATLAS}}\}}: A sequence-based learning approach for attack investigation,” in 30th USENIX security symposium (USENIX security 21), 2021, pp. 3005–3022.
- X. Han, T. F. J. Pasquier, A. Bates, J. Mickens, and M. I. Seltzer, “Unicorn: Runtime provenance-based detector for advanced persistent threats,” in 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society, 2020.
- M. Kapoor, J. Melton, M. Ridenhour, S. Krishnan, and T. Moyer, “Prov-gem: Automated provenance analysis framework using graph embeddings,” in 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2021, pp. 1720–1727.
- F. Yang, J. Xu, C. Xiong, Z. Li, and K. Zhang, “{{\{{PROGRAPHER}}\}}: An anomaly detection system based on provenance graph embedding,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 4355–4372.
- Z. Xu, P. Fang, C. Liu, X. Xiao, Y. Wen, and D. Meng, “Depcomm: Graph summarization on system audit logs for attack investigation,” in 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 540–557.
- W. U. Hassan, A. Bates, and D. Marino, “Tactical provenance analysis for endpoint detection and response systems,” in 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020, pp. 1172–1189.
- T. Zhu, J. Yu, C. Xiong, W. Cheng, Q. Yuan, J. Ying, T. Chen, J. Zhang, M. Lv, Y. Chen et al., “Aptshield: A stable, efficient and real-time apt detection system for linux hosts,” IEEE Transactions on Dependable and Secure Computing, 2023.