Functional trustworthiness of AI systems by statistically valid testing

Published 4 Oct 2023 in stat.ML, cs.AI, and cs.LG | (2310.02727v1)

Abstract: The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU AI Act for the conformity assessment of AI systems. We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways. Yet enacting a conformity assessment procedure that creates the false illusion of trust in insufficiently assessed AI systems is at best naive and at worst grossly negligent. The EU AI Act thus misses the point of ensuring quality by functional trustworthiness and correctly attributing responsibilities. The trustworthiness of an AI decision system lies first and foremost in the correct statistical testing on randomly selected samples and in the precision of the definition of the application domain, which enables drawing samples in the first place. We will subsequently call this testable quality functional trustworthiness. It includes a design, development, and deployment that enables correct statistical testing of all relevant functions. We are firmly convinced and advocate that a reliable assessment of the statistical functional properties of an AI system has to be the indispensable, mandatory nucleus of the conformity assessment. In this paper, we describe the three necessary elements to establish a reliable functional trustworthiness, i.e., (1) the definition of the technical distribution of the application, (2) the risk-based minimum performance requirements, and (3) the statistically valid testing based on independent random samples.

Abstract PDF HTML Upgrade to Chat

Authors (3)

References (34)

Summary

The paper argues that statistically valid testing is essential to ensure AI systems meet predefined functional performance standards.
It emphasizes establishing a clear technical distribution and risk-based minimum performance requirements for representative testing.
The study critiques the EU AI Act for relying too much on documentation while neglecting rigorous, data-driven validation protocols.

Functional Trustworthiness of AI Systems: A Critical Examination

The reviewed paper presents a discernible evaluation of the current regulatory state and standardization efforts underpinning the European Union's AI Act, with a concentrated focus on the pivotal role of functional trustworthiness through statistically valid testing. The discourse champions the statistical methods fundamental to ML and deep learning (DL) as central tenets for assessing AI systems' robustness, accuracy, and transparency, arguing that existing regulations fall short in these aspects.

Key Arguments and Concepts

The paper emphasizes the intrinsic necessity of functional trustworthiness, asserting that AI systems must undergo empirically valid statistical tests on independent and random samples to ensure they meet predefined performance standards. The concept is encapsulated as consisting of three foundational elements:

Definition of the Technical Distribution: Establishing a precise application domain is crucial. This involves characterizing the technical distribution, which allows for creating representative random samples imperative for testing. Such clarity ensures model performance is accurately measured against the identified domain.
Risk-Based Minimum Performance Requirements: The paper positions risk analysis at the core of system development, advocating that performance metrics must emanate from a thorough understanding of the application's risks. This appraisal should guide defining acceptable operational thresholds encompassing safety and non-discrimination.
Statistically Valid Testing: Testing AI models through randomly sampled data from the defined distribution is critical for assessing performance. This statistical approach ensures an AI system performs as intended within its deployment scope, thus addressing concerns about the unpredictability inherent in high-complexity models like those used in DL.

Criticisms of the Current EU AI Act

The authors critique the EU AI Act for emphasizing documentation over the empirical validation of AI system quality. They suggest the Act's framework potentially encourages insufficiently tested AI solutions to enter the market under a false guise of reliability. The inadequacy of provisions concerning random sampling and statistical validation in testing requirements is highly scrutinized, underscoring a gap in aligning regulatory guidelines with established ML principles.

Practical and Theoretical Implications

Practically, the paper outlines how adherence to these methodological principles could significantly enhance AI systems' trustworthiness and reliability, leading to better risk management in real-world applications. Such an approach supports a finer granularity in AI system deployment, emphasizing the need for application-specific testing to mitigate potential biases and vulnerabilities. Theoretically, the discussion solidifies the necessity of bridging traditional engineering approaches with data-driven techniques to forge robust AI solutions.

Future Directions

The paper advocates for a reformed AI regulatory environment that integrates functional trustworthiness as a core component of conformity assessment. It speculates that future developments may see more rigorous enforcement of statistical testing protocols, potentially leading to new industry standards harmonized globally.

Additionally, the discussion on AI systems, such as personal AI assistants and the trade-offs between creativity and ethical outputs, suggests an avenue for future empirical research to explore the balance between functionality and ethical constraints, especially as personal assistants become more entrenched in daily life.

Conclusion

This paper provides a comprehensive examination of the critical importance of statistically valid testing in establishing functional trustworthiness for AI systems. It makes a compelling case that existing regulatory frameworks, like the EU AI Act, are yet to fully integrate these essential principles into their core. By focusing on empirical validation through statistically driven methodologies and advocating for a harmonized approach in AI system regulation, the presented arguments pave crucial pathways for future policy enhancements and technological advancements in the field of artificial intelligence.

Markdown Report Issue