COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios (2004.05835v3)

Published 13 Apr 2020 in cs.LG and stat.ML

Abstract: The COVID-19 can cause severe pneumonia and is estimated to have a high impact on the healthcare system. The standard image diagnosis tests for pneumonia are chest X-ray (CXR) and computed tomography (CT) scan. CXR are useful in because it is cheaper, faster and more widespread than CT. This study aims to identify pneumonia caused by COVID-19 from other types and also healthy lungs using only CXR images. In order to achieve the objectives, we have proposed a classification schema considering the multi-class and hierarchical perspectives, since pneumonia can be structured as a hierarchy. Given the natural data imbalance in this domain, we also proposed the use of resampling algorithms in order to re-balance the classes distribution. Our classification schema extract features using some well-known texture descriptors and also using a pre-trained CNN model. We also explored early and late fusion techniques in order to leverage the strength of multiple texture descriptors and base classifiers at once. To evaluate the approach, we composed a database, named RYDLS-20, containing CXR images of pneumonia caused by different pathogens as well as CXR images of healthy lungs. The classes distribution follows a real-world scenario in which some pathogens are more common than others. The proposed approach achieved a macro-avg F1-Score of 0.65 using a multi-class approach and a F1-Score of 0.89 for the COVID-19 identification in the hierarchical classification scenario. As far as we know, we achieved the best nominal rate obtained for COVID-19 identification in an unbalanced environment with more than three classes. We must also highlight the novel proposed hierarchical classification approach for this task, which considers the types of pneumonia caused by the different pathogens and lead us to the best COVID-19 recognition rate obtained here.

Citations (396)

View on Semantic Scholar

Summary

The paper introduces a hybrid approach combining texture descriptors and pre-trained CNN models to enhance COVID-19 detection accuracy.
It employs resampling and fusion techniques to address data imbalance and optimize both flat and hierarchical classification methods.
Experimental results demonstrate macro F1-scores of 0.65 in multi-class and 0.89 in hierarchical scenarios, highlighting significant clinical potential.

COVID-19 Identification in Chest X-ray Images on Flat and Hierarchical Classification Scenarios

The research presents a focused exploration of automatic COVID-19 identification, leveraging chest X-ray (CXR) images across flat and hierarchical classification scenarios. The paper addresses the detection of pneumonia caused by COVID-19 compared to other pathogenic pneumonia and healthy lung conditions, utilizing both CXR and CT imaging modalities. Emphasizing CXR due to its cost-effectiveness and accessibility, the research probes into diagnostic methodologies classified as multi-class and hierarchical, considering the hierarchical structure of pneumonia pathogens.

Methodological Framework

To tackle the inherent data imbalance in these clinical scenarios, the paper employs resampling algorithms to re-balance class distributions effectively. Feature extraction is executed using a hybrid approach combining traditional texture descriptors (e.g., LBP, EQP) with sophisticated representation learning models (e.g., pre-trained CNN Inception-V3). The paper further refines the classification process by integrating early and late fusion techniques, combining the strengths of multiple descriptors and classifiers for enhanced performance.

The dataset engineered for experimentation, named RYDLS-20, consists of diverse CXR images representing various pneumonia types and healthy lungs, reflecting real-world pathogen prevalence disparity. This setup allows the paper to simulate a holistic environment for classifier testing.

Experimental Findings

The classification schema achieved a macro-avg F1-Score of 0.65 in multi-class scenarios, with MLP classifiers performing notably well alongside LBP features and ENN resampling. Furthermore, in hierarchical classification, the methodology attained an F1-Score of 0.89 specifically for COVID-19 differentiation, underscoring the utility of hierarchical approaches in pathogen type identification. The hierarchical model capitalized on texture descriptors and the BSIF, EQP, and LPQ combination with SMOTE+TL resampling, delivering robust COVID-19 recognition rates.

Discussion and Implications

This work illustrates the potential of hierarchical classification, beyond simple flat classification styles, offering nuanced insights into recognizing complex pneumonia hierarchies from CXR. While the reliance on relatively small datasets poses challenges, this research lays a pathway for scalable diagnostics by advocating for hierarchical models in medical image analysis.

Future studies should consider expanding dataset size for deep learning applications and cross-validation to enhance robustness. Moreover, alternative classifier configurations, including local classifiers, and additional feature sets could further optimize detection accuracy.

This research supports the integration of sophisticated AI strategies in clinical settings, especially amid global health crises like COVID-19, suggesting significant potential for AI in augmenting early diagnostic capabilities and streamlining healthcare workflows.

PDF Markdown