Using Decoder-Based Distillation for Enhancing Multilingual Clinical Case Report Summarization
Authors: Rusnachenko, N., Liu, X., Chang, J. and Zhang, J.J.
Journal: Ceur Workshop Proceedings
Volume: 4038
Pages: 544-553
ISSN: 1613-0073
Abstract:Automatic summarization of clinical reports represent an important field of studies that contribute to shortening long textual narratives written in various languages. Effective report summarization poses numerous challenges, including density of medical terms mentions, semantic interdependency among mentioned entities. The most recent advances of instruction-tuned models illustrate promising capabilities of models at various scale across numerous fields of Natural Language Processing, including textual summarization. A hybrid teacher-student distillation process leverages the power of knowledge distillation by transferring knowledge from a large model (teacher) to a smaller model (student). To our best knowledge, numerous existing studies broadly exploit Seq2seq models. Despite their effectiveness for dialogues and summarization of short texts, such techniques have not become common for supporting multilingual and long input contexts. To bridge the gap in exploring distillation tuning, this paper proposes an adaptation of the teacher-student framework for decoder based systems. In this paper, we experiment with a teacher-student framework for summarising clinical case reports. We adopt the Qwen2.5 models family and evaluate our setup on the MultiClinSumsmall dataset. We demonstrate that fine-tuning the 0.5B model with the knowledge transferred from the 72B model results in 2.4%-4% performance increment by Rouge metrics compared to the conventional fine-tuning process, highlighting our model’s practical benefits in clinical information processing. Our framework is publicly available: https://github.com/nicolay-r/ distil-tuning-llm
Source: Scopus