Improving the CXR Reports Generation with Multi-modal feature Alignment and Self-Refining strategy
Authors: Cheddi, F., Habbani, A. and Nait-Charif, H.
Journal: 2024 3rd International Conference on Embedded Systems and Artificial Intelligence Esai 2024
DOI: 10.1109/ESAI62891.2024.10913509
Abstract:Medical chest X-ray images are necessary for the diagnosis of different diseases. The need for automated interpretation and report generation of these images is crucial. It not only saves radiologists' time but also minimizes the risk of diagnostic mistakes. However, several challenges impede this task due to the employment of uni-directional image-To-report in the encoder-decoder deep learning model and the absence of contextual details in the process of report generation can lead to incomplete or inaccurate descriptions. To address this, we propose an approach based on Multi-modal feature Alignment and Self-Refining mechanism RG-MASR in order to generate an improved medical report from chest X-ray images automatically. Our method comprises three modules: visual and textual characteristics extraction to extract the semantic characteristics from X-ray images and their paired reports. Second, a multimodal feature alignment module is employed to leverage both textual and visual features. Finally, we integrate a self-refining technique in the report generator module to refine alignment and improve the output to generate a comprehensive report. We evaluate our method on the IU X-ray and NIH public chest X-ray datasets. The results demonstrate that our proposed RG-MASR surpasses existing approaches in terms of ROUGE and BLEU metrics.
Source: Scopus