- The paper introduces XTransCT, which uses dual X-ray projections and a Transformer network to achieve 3D CT reconstruction in just 44ms.
- The methodology bypasses traditional 3D convolutions by leveraging a pre-trained ResNet backbone and a voxel space search module within the Transformer architecture.
- Empirical results on diverse datasets show significant improvements in SSIM, PSNR, and Dice metrics, outperforming models like X2CTGAN.
Introduction
The research paper presents XTransCT, a novel framework for ultra-fast volumetric CT reconstruction based on only two orthogonal X-ray projections, tailored for image-guided radiation therapy (IGRT). This innovative approach melds a voxel-space-searching Transformer network to enhance the efficiency and accuracy of CT image reconstruction, crucial for clinical settings where minimizing radiation exposure and maximizing reconstruction speed are paramount.
Methodology
The proposed XTransCT algorithm circumvents the constraints of traditional CT methods by employing a model devoid of 3D convolutions, significantly ensuring reduced latency and heightened performance. At the core of this model is the Transformer architecture, traditionally used in NLP, repurposed here to interpret X-ray data to achieve volumetric reconstruction.
Dual X-ray Setup
The study employs a dual X-ray setup, with beams set at 45° and 135° angles to ensure robust data acquisition from sparse projections (Figure 1).
Figure 1: Dual X-ray setup. Two X-rays irradiate the patient, and their real-time fusion with CT scans enables precise patient positioning while accounting for potential misalignments.
Framework
The framework starts with simulating X-ray images from CT scans using digitally reconstructed radiograph (DRR) methods, which are fed into a pre-trained ResNet backbone. The extracted features undergo transformation via a voxel space search module encoded in a Transformer, which then decodes these coordinates into the corresponding voxel values, recreating full 3D CT images (Figure 2).
Figure 2: Framework of this study. We employ the DRR method to generate X-ray images simulated from CT scans.
Network Architecture
The architecture entails a pre-trained ResNet applied on the DRR images, whose features are mapped linearly and input into a Transformer encoder and decoder (Figure 3 and Figure 4). This architecture facilitates more efficient and effective 2D to 3D information aggregation compared to traditional CNN-based methods.
Figure 3: Structure of the ResNet as a backbone.
Figure 4: Structure of the transformer.
Results and Discussion
The application of XTransCT across a 50-patient dataset, LIDC-IDRI, and LNDb datasets demonstrates impressive results in terms of SSIM, PSNR, and Dice metrics, with significant improvements over existing models like X2CT and X2CTGAN (Figure 5, Figure 6, and Figure 7).
Figure 5: Comparative experiment on the 50-patient dataset.
Figure 6: Comparative experiments on the LIDC-IDRI Dataset.
Figure 7: Generalization of the method is verified using comparative experiments on the LNDb Dataset.
The performance advantage lies notably in speed, with each 3D image reconstructed in merely 44ms, a stark contrast to previous 3D convolution-dependent algorithms that require significantly more time. This rapid reconstruction aligns perfectly with the needs of real-time IGRT.
Conclusion
XTransCT stands out in the domain of medical imaging by reducing reconstruction times substantially while maintaining high image quality and structural fidelity. By leveraging Transformers for sparse X-ray data understanding, this method offers a promising direction for efficient, low-radiation imaging solutions in clinical applications. Future developments could investigate expanding this approach across other imaging modalities and further refining voxel-space interactions for enhanced detail retrieval.