Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Runtime Monitoring of Human-centric Requirements in Machine Learning Components: A Model-driven Engineering Approach (2310.06219v1)

Published 10 Oct 2023 in cs.SE

Abstract: As ML components become increasingly integrated into software systems, the emphasis on the ethical or responsible aspects of their use has grown significantly. This includes building ML-based systems that adhere to human-centric requirements, such as fairness, privacy, explainability, well-being, transparency and human values. Meeting these human-centric requirements is not only essential for maintaining public trust but also a key factor determining the success of ML-based systems. However, as these requirements are dynamic in nature and continually evolve, pre-deployment monitoring of these models often proves insufficient to establish and sustain trust in ML components. Runtime monitoring approaches for ML are potentially valuable solutions to this problem. Existing state-of-the-art techniques often fall short as they seldom consider more than one human-centric requirement, typically focusing on fairness, safety, and trust. The technical expertise and effort required to set up a monitoring system are also challenging. In my PhD research, I propose a novel approach for the runtime monitoring of multiple human-centric requirements. This approach leverages model-driven engineering to more comprehensively monitor ML components. This doctoral symposium paper outlines the motivation for my PhD work, a potential solution, progress so far and future plans.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. D. Nigenda, Z. Karnin, M. B. Zafar, R. Ramesha, A. Tan, M. Donini, and K. Kenthapadi, “Amazon sagemaker model monitor: A system for real-time insights into deployed machine learning models,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 3671–3681.
  2. T. Schröder and M. Schulz, “Monitoring machine learning models: a categorization of challenges and methods,” Data Science and Management, vol. 5, no. 3, pp. 105–116, 2022.
  3. D. Pessach and E. Shmueli, “A review on fairness in machine learning,” ACM Computing Surveys (CSUR), vol. 55, no. 3, pp. 1–44, 2022.
  4. H.-M. Heyn, E. Knauss, A. P. Muhammad, O. Eriksson, J. Linder, P. Subbiah, S. K. Pradhan, and S. Tungal, “Requirement engineering challenges for ai-intense systems development,” in 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN).   IEEE, 2021, pp. 89–96.
  5. D. Mougouei, H. Perera, W. Hussain, R. Shams, and J. Whittle, “Operationalizing human values in software: A research roadmap,” in Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2018, pp. 780–784.
  6. K. Crawford, “The hidden costs of ai,” New Scientist, vol. 249, no. 3327, pp. 46–49, 2021.
  7. P. Solanki, J. Grundy, and W. Hussain, “Operationalising ethics in artificial intelligence for healthcare: A framework for ai developers,” AI and Ethics, vol. 3, no. 1, pp. 223–240, 2023.
  8. J. Dastin, “Amazon scraps secret ai recruiting tool that showed bias against women,” in Ethics of data and analytics.   Auerbach Publications, 2022, pp. 296–299.
  9. P. B. Brandtzaeg and A. Følstad, “Chatbots: changing user needs and motivations,” interactions, vol. 25, no. 5, pp. 38–43, 2018.
  10. E. Strickland, “Ibm watson, heal thyself: How ibm overpromised and underdelivered on ai health care,” IEEE Spectrum, vol. 56, no. 4, pp. 24–31, 2019.
  11. L. Zhu, X. Xu, Q. Lu, G. Governatori, and J. Whittle, “Ai and ethics—operationalizing responsible ai,” Humanity Driven AI: Productivity, Well-being, Sustainability and Partnership, pp. 15–33, 2022.
  12. J. Guerin, R. S. Ferreira, K. Delmas, and J. Guiochet, “Unifying evaluation of machine learning safety monitors,” in 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE).   IEEE, 2022, pp. 414–422.
  13. H. Torfah and S. A. Seshia, “Runtime monitors for operational design domains of black-box ml-models,” in NeurIPS ML Safety Workshop.
  14. S. Rabanser, S. Günnemann, and Z. Lipton, “Failing loudly: An empirical study of methods for detecting dataset shift,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  15. A. Cummaudo, S. Barnett, R. Vasa, and J. Grundy, “Threshy: Supporting safe usage of intelligent web services,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 1645–1649.
  16. A. Ghosh, A. Shanbhag, and C. Wilson, “Faircanary: Rapid continuous explainable fairness,” in Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 307–316.
  17. P. Kourouklidis, D. Kolovos, N. Matragkas, and J. Noppen, “Towards a low-code solution for monitoring machine learning model performance,” in Proceedings of the 23rd ACM/IEEE international conference on model driven engineering languages and systems: companion proceedings, 2020, pp. 1–8.
  18. Y. Liu, T. Menzies, and B. Cukic, “Data sniffing-monitoring of machine learning for online adaptive systems,” in 14th IEEE International Conference on Tools with Artificial Intelligence, 2002.(ICTAI 2002). Proceedings.   IEEE, 2002, pp. 16–21.
  19. X. Zhou, W. Lo Faro, X. Zhang, and R. S. Arvapally, “A framework to monitor machine learning systems using concept drift detection,” in Business Information Systems: 22nd International Conference, BIS 2019, Seville, Spain, June 26–28, 2019, Proceedings, Part I 22.   Springer, 2019, pp. 218–231.
  20. G. A. Lewis, S. Echeverría, L. Pons, and J. Chrabaszcz, “Augur: A step towards realistic drift detection in production ml systems,” in Proceedings of the 1st Workshop on Software Engineering for Responsible AI, 2022, pp. 37–44.
  21. B. Eck, D. Kabakci-Zorlu, Y. Chen, F. Savard, and X. Bao, “A monitoring framework for deployed machine learning models with supply chain examples,” arXiv preprint arXiv:2211.06239, 2022.
  22. K. Aslansefat, I. Sorokos, D. Whiting, R. Tavakoli Kolagari, and Y. Papadopoulos, “Safeml: safety monitoring of machine learning classifiers through statistical difference measures,” in International Symposium on Model-Based Safety and Assessment.   Springer, 2020, pp. 197–211.
  23. T. Henzinger, M. Karimi, K. Kueffner, and K. Mallik, “Runtime monitoring of dynamic fairness properties,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, pp. 604–614.
  24. A. Albarghouthi and S. Vinitsky, “Fairness-aware programming,” in Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019, pp. 211–219.
  25. M. A. Langford, K. H. Chan, J. E. Fleck, P. K. McKinley, and B. H. Cheng, “Modalas: Model-driven assurance for learning-enabled autonomous systems,” in 2021 ACM/IEEE 24th International Conference on Model Driven Engineering Languages and Systems (MODELS).   IEEE, 2021, pp. 182–193.
  26. T. Byun and S. Rayadurgam, “Manifold for machine learning assurance,” in ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results, 2020, pp. 97–100.
  27. A. Roy, A. Cobb, N. Bastian, B. Jalaian, and S. Jha, “Runtime monitoring of deep neural networks using top-down context models inspired by predictive processing and dual process theory,” in AAAI Spring Symposium 2022, 2022.
  28. H. O. Obie, I. Ilekura, H. Du, M. Shahin, J. Grundy, L. Li, J. Whittle, and B. Turhan, “On the violation of honesty in mobile apps: Automated detection and categories,” in Proceedings of the 19th International Conference on Mining Software Repositories, 2022, pp. 321–332.
  29. J. Grundy, H. Khalajzadeh, J. McIntosh, T. Kanij, and I. Mueller, “Humanise: Approaches to achieve more human-centric software engineering,” in International Conference on Evaluation of Novel Approaches to Software Engineering.   Springer, 2020, pp. 444–468.
  30. C. Buerkle, F. Geissler, M. Paulitsch, and K.-U. Scholl, “Fault-tolerant perception for automated driving a lightweight monitoring approach,” arXiv preprint arXiv:2111.12360, 2021.
  31. A. Bhattacharjee, Y. Barve, S. Khare, S. Bao, Z. Kang, A. Gokhale, and T. Damiano, “Stratum: A bigdata-as-a-service for lifecycle management of iot analytics applications,” in 2019 IEEE International Conference on Big Data (Big Data).   IEEE, 2019, pp. 1607–1612.
  32. J. Guerin, K. Delmas, and J. Guiochet, “Evaluation of runtime monitoring for uav emergency landing,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 9703–9709.
  33. J. Klaise, A. Van Looveren, C. Cox, G. Vacanti, and A. Coca, “Monitoring and explainability of models in production,” arXiv preprint arXiv:2007.06299, 2020.
  34. P. Kourouklidis, D. Kolovos, J. Noppen, and N. Matragkas, “A model-driven engineering approach for monitoring machine learning models,” in 2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C).   IEEE, 2021, pp. 160–164.
  35. Vertex ai model monitoring. [Online]. Available: https://cloud.google.com/vertex-ai/docs/model-monitoring
  36. Monitor azure machine learning. [Online]. Available: https://docs.microsoft.com/en-us/azure/ machine-learning/monitor-azure-machine-learning
  37. H. Wang, J. Xu, C. Xu, X. Ma, and J. Lu, “Dissector: Input validation for deep learning applications by crossing-layer dissection,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 727–738.
  38. M. Staples, L. Zhu, and J. Grundy, “Continuous validation for data analytics systems,” in Proceedings of the 38th International Conference on Software Engineering Companion, 2016, pp. 769–772.
  39. B. Mittelstadt, “Principles alone cannot guarantee ethical ai. nat. mach. intell.(2019),” 2019.
  40. J. C. Ibáñez and M. V. Olmeda, “Operationalising ai ethics: how are companies bridging the gap between practice and principles? an exploratory study,” AI & SOCIETY, vol. 37, no. 4, pp. 1663–1687, 2022.
  41. Ibm watson openscale. [Online]. Available: https://www.ibm.com/docs/en/cloud-paks/cp-data/3.5.0?topic=services-watson-openscale
  42. L. Ehrlinger, V. Haunschmid, D. Palazzini, and C. Lettner, “A daql to monitor data quality in machine learning applications,” in Database and Expert Systems Applications: 30th International Conference, DEXA 2019, Linz, Austria, August 26–29, 2019, Proceedings, Part I 30.   Springer, 2019, pp. 227–237.
  43. A. Stocco, M. Weiss, M. Calzana, and P. Tonella, “Misbehaviour prediction for autonomous driving systems,” in Proceedings of the ACM/IEEE 42nd international conference on software engineering, 2020, pp. 359–371.
  44. W. Hussain, H. Perera, J. Whittle, A. Nurwidyantoro, R. Hoda, R. A. Shams, and G. Oliver, “Human values in software engineering: Contrasting case studies of practice,” IEEE Transactions on Software Engineering, vol. 48, no. 5, pp. 1818–1833, 2020.
  45. A. Nurwidyantoro, M. Shahin, M. Chaudron, W. Hussain, H. Perera, R. A. Shams, and J. Whittle, “Integrating human values in software development using a human values dashboard,” Empirical Software Engineering, vol. 28, no. 3, p. 67, 2023.
  46. O. Haggag, “Better identifying and addressing diverse issues in mhealth and emerging apps using user reviews,” in Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering, 2022, pp. 329–335.
  47. O. Haggag, S. Haggag, J. Grundy, and M. Abdelrazek, “Covid-19 vs social media apps: does privacy really matter?” in 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS).   IEEE, 2021, pp. 48–57.
  48. A. Felländer, J. Rebane, S. Larsson, M. Wiggberg, and F. Heintz, “Achieving a data-driven risk assessment methodology for ethical ai,” Digital Society, vol. 1, no. 2, p. 13, 2022.
  49. A. Yohannis and D. Kolovos, “Towards model-based bias mitigation in machine learning,” in Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems, 2022, pp. 143–153.
  50. R. France and B. Rumpe, “Domain specific modeling,” Software & Systems Modeling, vol. 4, no. 1, pp. 1–3, 2005.
  51. M. D. Del Fabro, J. Bézivin, and P. Valduriez, “Weaving models with the eclipse amw plugin,” in Eclipse Modeling Symposium, Eclipse Summit Europe, vol. 2006.   Citeseer, 2006, pp. 37–44.
  52. M. Almorsy, J. Grundy, and A. S. Ibrahim, “Adaptable, model-driven security engineering for saas cloud-based applications,” Automated software engineering, vol. 21, pp. 187–224, 2014.
  53. R. F. Paige, G. K. Olsen, D. Kolovos, S. Zschaler, and C. D. Power, “Building model-driven engineering traceability,” in ECMDA Traceability Workshop (ECMDA-TW).   Sintef, 2010, p. 49.
  54. I. Galvao and A. Goknil, “Survey of traceability approaches in model-driven engineering,” in 11th IEEE International Enterprise Distributed Object Computing Conference (EDOC 2007).   IEEE, 2007, pp. 313–313.
  55. H. Khalajzadeh, A. J. Simmons, M. Abdelrazek, J. Grundy, J. Hosking, and Q. He, “An end-to-end model-based approach to support big data analytics development,” Journal of Computer Languages, vol. 58, p. 100964, 2020.
  56. A. Khambati, J. Warren, J. Grundy, and J. Hosking, “A model driven approach to care planning systems for consumer engagement in chronic disease management,” electronic Journal of Health Informatics, vol. 4, no. 1, p. 3, 2009.
  57. M. Almorsy, J. Grundy, and A. S. Ibrahim, “Mdse@ r: model-driven security engineering at runtime,” in Cyberspace Safety and Security: 4th International Symposium, CSS 2012, Melbourne, Australia, December 12-13, 2012. Proceedings 4.   Springer, 2012, pp. 279–295.
  58. C. H. Kim, J. Grundy, and J. Hosking, “A suite of visual languages for model-driven development of statistical surveys and services,” Journal of Visual Languages & Computing, vol. 26, pp. 99–125, 2015.
  59. P. Arcaini, E. Riccobene, and P. Scandurra, “Modeling and analyzing mape-k feedback loops for self-adaptation,” in 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems.   IEEE, 2015, pp. 13–23.
  60. L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao et al., “Deepmutation: Mutation testing of deep learning systems,” in 2018 IEEE 29th international symposium on software reliability engineering (ISSRE).   IEEE, 2018, pp. 100–111.
  61. N. Humbatova, G. Jahangirova, and P. Tonella, “Deepcrime: mutation testing of deep learning systems based on real faults,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021, pp. 67–78.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com