Emergent Mind

Abstract

Recently, the research community of computerized medical imaging has started to discuss and address potential fairness issues that may emerge when developing and deploying AI systems for medical image analysis. This chapter covers some of the pressing challenges encountered when doing research in this area, and it is intended to raise questions and provide food for thought for those aiming to enter this research field. The chapter first discusses various sources of bias, including data collection, model training, and clinical deployment, and their impact on the fairness of machine learning algorithms in medical image computing. We then turn to discussing open challenges that we believe require attention from researchers and practitioners, as well as potential pitfalls of naive application of common methods in the field. We cover a variety of topics including the impact of biased metrics when auditing for fairness, the leveling down effect, task difficulty variations among subgroups, discovering biases in unseen populations, and explaining biases beyond standard demographic attributes.

Potential sources of bias in AI for MIC: data, model design, and developers.

Overview

  • The paper examines fairness issues in AI-based medical imaging, highlighting biases introduced at various stages from data collection to model deployment.

  • It categorizes sources of bias into data, models, and people, and explores how demographic attributes, model design, and the diversity of research teams impact fairness.

  • The paper discusses the shortcomings of commonly used fairness measures, the difficulty of ensuring unbiased performance across diverse populations, and the importance of considering factors beyond standard demographic attributes.

Overview of Challenges in Ensuring Fairness in AI for Medical Imaging

The paper, "Open Challenges on Fairness of Artificial Intelligence in Medical Imaging Applications" by Enzo Ferrante and Rodrigo Echeveste, from the Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), CONICET, and Universidad Nacional del Litoral, provides a thorough examination of fairness issues in AI-driven medical imaging. It highlights the multifaceted nature of biases introduced at various stages of the pipeline, from data collection to model deployment and suggests potential paths for future research in this domain.

Sources of Bias in AI Systems

The paper categorizes sources of bias into three broad classes: data, models, and people. This classification facilitates targeted interventions:

  1. Data: Data imbalance and domain shifts are significant sources of bias. The paper discusses how demographic attributes such as race, gender, and socioeconomic status can impact the data distribution and consequently, the model's performance. Intersectionality and healthcare disparities are also noted as critical factors that can introduce bias. For instance, disparities in health access can lead to variations in disease progression, which, if not accounted for, can result in biased models.
  2. Models: Different ML models carry inherent inductive biases based on their design and training strategies. The paper emphasizes the need for designing models with appropriate inductive biases that generalize well across diverse sub-populations. The authors highlight challenges with deep learning models, such as their proneness to shortcut learning and calibration issues, particularly affecting under-represented groups.
  3. People: The composition of the research and development teams plays a crucial role in the biases exhibited by AI systems. The AI community's limited diversity can restrict the consideration of scenarios that affect underrepresented groups. This aspect underlines the necessity of a globally inclusive approach to data acquisition, model development, and deployment.

Biased Metrics and the Leveling Down Effect

The paper identifies several pitfalls associated with fairness audits:

  • Metric Bias: Metrics like the Dice coefficient for segmentation and the Expected Calibration Error (ECE) for probabilistic outputs can be biased concerning certain properties varying across demographic groups. These biases can result in misleading fairness assessments. For example, the Dice coefficient may be lower for smaller anatomical structures and lower resolution images, which could correlate with certain demographic attributes.
  • Leveling Down Effect: Pursuing fairness by simply reducing performance disparities can lead to overall degraded performance. The paper promotes the concept of "leveling up" rather than "leveling down," emphasizing the need for interventions that enhance the performance of disadvantaged groups without impairing the well-performing ones.

Task Difficulty and Bias Audits

Disparities in task difficulty across demographic subgroups complicate fairness evaluations. For example, diagnosing thoracic diseases in X-ray images is more challenging in women due to anatomical differences. The paper underscores the necessity of recognizing these intrinsic task difficulties when evaluating model performance and fairness.

Discovering Biases in Unseen Populations

Fairness properties of AI systems may not transfer across different populations due to distribution shifts such as demographic and covariate shifts. The paper emphasizes preemptive measures and methodologies to uncover biases in new, unlabeled populations, arguing for the importance of quality indices and performance estimators that do not rely on ground-truth annotations.

Beyond Standard Demographic Attributes

Auditing biases solely based on standard demographic attributes such as age, gender, and ethnicity may overlook other significant factors. The authors advocate for considering anatomical properties, hospitalization conditions, and imaging device characteristics in bias audits. They recommend innovative methods like adversarial reweighting and counterfactual explanations to uncover biases beyond standard attributes.

Conclusion

This paper provides an in-depth exploration of the persistent challenges within the domain of fairness in AI-driven medical imaging. By categorizing biases, scrutinizing auditing metrics, and emphasizing the nuanced difficulties across demographic groups, the authors draw attention to the multifaceted nature of the problem. Their insights serve as a call to action for the research community to develop robust methodologies that address these challenges holistically, ensuring equitable and effective medical AI systems.

References

[The full bibliography from the chapter is appended here.]

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube