CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation (2309.12639v1)

Published 22 Sep 2023 in cs.CV

Abstract: Surface defect inspection is of great importance for industrial manufacture and production. Though defect inspection methods based on deep learning have made significant progress, there are still some challenges for these methods, such as indistinguishable weak defects and defect-like interference in the background. To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer. CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder. This can maintain the merit of CNN capturing detailed features and that of transformer depressing noises in the background, which facilitates accurate defect detection. In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects, so as to further reduce the impact of the redundant background. Extensive experiments conducted on the surface defect datasets DAGM 2007, Magnetic tile, and NEU show that the proposed CINFormer achieves state-of-the-art performance in defect detection.

Summary

The paper introduces CINFormer, a novel architecture that injects multi-stage CNN features into a transformer to enhance surface defect detection.
It leverages a UNet-like structure and a Top-K self-attention module to focus on critical tokens and suppress background noise.
Extensive experiments show that CINFormer outperforms traditional CNN and transformer models on challenging industrial datasets.

CINFormer: Transformer Network with Multi-Stage CNN Feature Injection for Surface Defect Segmentation

Introduction

The paper "CINFormer: Transformer Network with Multi-Stage CNN Feature Injection for Surface Defect Segmentation" proposes a novel approach for surface defect inspection in industrial processes. Despite advancements in deep learning-based defect detection, challenges persist due to indistinguishable weak defects and defect-like interference from backgrounds. The paper introduces CINFormer, a UNet-like architecture that integrates CNN features into a transformer network to enhance the segmentation of surface defects. This architecture leverages the strengths of CNNs in capturing detailed features and transformers in mitigating background noise, improving the accuracy of defect detection.

Figure 1: Comparison of feature visualization for CNN (a), transformer (b), and the proposed CINFormer (c). It can be observed that CINFormer can better focus on defect areas and suppress redundant background interference.

CINFormer Architecture

CINFormer is built upon a UNet-like structure where the encoder integrates multi-level CNN features into different stages of a transformer network. The CNN features are injected to maintain the ability to capture detailed defect features while the transformer component suppresses noise interference. Specifically, a CNN stem generates hierarchical features injected into the transformer network, allowing enhanced information retention for defect identification.

The incorporation of the Top-K self-attention module further refines this process by focusing on tokens with more critical information, effectively suppressing redundant background details. This module ranks tokens and channels based on their variance, retaining the most informative aspects for defect highlighting.

Figure 2: The architecture of the CINFormer, depicting its UNet-like encoder-decoder design with CNN feature injection and the Top-K self-attention mechanism.

Experimental Evaluation

Extensive experiments were conducted using datasets such as DAGM 2007, Magnetic Tile, and NEU to demonstrate CINFormer’s efficacy. The results reveal consistent performance improvements across these datasets compared to existing methods. The architecture outperforms both CNN-based and transformer-based models, indicating effective synergy of local and global feature capturing abilities.

CINFormer showed superior performance on challenging datasets with weak defects and complex backgrounds, underlining its robust capability in practical industrial scenarios.

Figure 3: Visualization of segmentation results obtained by various methods, highlighting CINFormer’s accuracy in defect classification across different datasets.

Ablation Studies

The paper’s ablation studies indicate significant performance gains attributable to the multi-stage CNN feature injection, contrasted with bidirectional and post-transformer structures. Additionally, the integration of the Top-K self-attention mechanism demonstrated enhanced efficiency and effectiveness in defect detection by selectively processing critical feature components.

Figure 4: Illustration of the Top-K self-attention mechanism, showcasing various stages of token and channel selection.

Implications and Future Work

CINFormer offers substantial practical implications in automated industrial inspection processes, providing an adaptable framework for environments with varied defect detection challenges. By effectively leveraging CNN and transformer architectures, the approach fosters improved feature representation and noise suppression.

Future research could explore further optimizations in self-attention mechanisms and enhanced integration techniques to expand the applicability of CINFormer across broader defect typologies and industries.

Conclusion

The CINFormer presents a sophisticated yet computationally efficient solution for surface defect segmentation, combining CNN’s detailed feature representation with transformers' global context awareness. The inclusion of the Top-K self-attention mechanism emphasizes important features while minimizing background interference, leading to state-of-the-art performance in diverse defect scenarios. The study provides a significant step forward in leveraging advanced neural architectures for industrial defect detection tasks.