A Benchmark for Multi-Task Evaluation of Pretrained Models in Medical Report Generation

Open Access

Issue		BIO Web Conf. Volume 174, 2025 2025 7^th International Conference on Biotechnology and Biomedicine (ICBB 2025)


Article Number		03010
Number of page(s)		7
Section		Technologies and Methodologies in Biomedical Research
DOI		https://doi.org/10.1051/bioconf/202517403010
Published online		12 May 2025

Hossain M D Z, Sohel F, Shiratuddin M F, et al. (2019) A comprehensive survey of deep learning for image captioning[J]. ACM Computing Surveys, 51(6): 1-36. [CrossRef] [Google Scholar]
Li, Y., Liang, X., Hu, Z., et al. (2018) Hybrid retrieval-generation reinforced agent for medical image report generation. Adv. Neural Inf. Process. Syst., 31. [Google Scholar]
Demner-Fushman, D., Kohli, M. D., Rosenman, M. B., et al. (2016) Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc., 23: 304–310. [CrossRef] [PubMed] [Google Scholar]
Jing, B., Xie, P., Xing, E., et al. (2018) On the automatic generation of medical imaging reports. In: Association for Computational Linguistics, pp. 2577–2586. [Google Scholar]
Wang, Z., Liu, L., Wang, L., et al. (2023) R2GenGPT: Radiology report generation with frozen LLMs. Meta-Radiology, 1(3): 100033. [CrossRef] [Google Scholar]
Wang, X., Li, Y., Wang, F., et al. (2024) R2GenCSR: Retrieving context samples for large language model-based X-ray medical report generation. arXiv preprint arXiv:2408.09743. [Google Scholar]
Thawkar, O., Shaker, A., Mullappilly, S. S., et al. (2023) XrayGPT: Chest radiographs summarization using medical vision-language models. arXiv preprint arXiv:2306.07971. [Google Scholar]
Jiao, J., Zhou, J., Li, X., et al. (2024) USFM: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Med. Image Anal., 96: 103202. [CrossRef] [Google Scholar]
Zuo, J., et al. (2023) PLIP: Language-image pre-training for person representation learning. arXiv preprint arXiv:2305.08386. [Google Scholar]
Zhang, S., et al. (2023) BiomedCLIP: A multimodal biomedical foundation model pre-trained from fifteen million scientific image-text pairs. arXiv preprint arXiv:2303.00915. [Google Scholar]
Johnson, A.E.W., et al. (2019) MIMIC-CXR, a deidentified publicly available database of chest radiographs with free-text reports. Sci. Data, 6: 317. [CrossRef] [Google Scholar]
Rückert, J., et al. (2024) ROCOV2: Radiology objects in context version 2, an updated multimodal image dataset. Sci. Data, 11: 688. [CrossRef] [Google Scholar]
Tsuneki, M., Kanavati, F. (2022) Inference of captions from histopathological patches. In: International Conference on Medical Imaging with Deep Learning. Proceedings of Machine Learning Research, pp. 1235–1250. [Google Scholar]
Liu, X., Ji, K., Fu, Y., et al. (2021) P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602. [Google Scholar]
Hu, E. J., Shen, Y., Wallis, P., et al. (2021) LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. [Google Scholar]
Zhao, W. X., Zhou, K., Li, J., et al. (2023) A survey of large language models. arXiv preprint arXiv:2303.18223. [Google Scholar]
Yang, W., Liu, M., Wang, Z., et al. (2024) Foundation models meet visualizations: Challenges and opportunities. Comput. Visual Media, 1–26. [Google Scholar]
Liu, P., Yuan, W., Fu, J., et al. (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9): 1–35. [Google Scholar]
Raffel, C., Shazeer, N., Roberts, A., et al. (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140): 1–67. [Google Scholar]
Li, X. L., Liang, P. (2021) Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190. [Google Scholar]
Brown, T., Mann, B., Ryder, N., et al. (2020) Language models are few-shot learners. Adv. Neural Inf. Process. Syst., 33: 1877–1901. [Google Scholar]
Alayrac, J. B., Donahue, J., Luc, P., et al. (2022) Flamingo: A visual language model for few-shot learning. Adv. Neural Inf. Process. Syst., 35: 23716–23736. [Google Scholar]
Wei, J., Wang, X., Schuurmans, D., et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst., 35: 24824–24837. [Google Scholar]
Chowdhery, A., Narang, S., Devlin, J., et al. (2023) PaLM: Scaling language modeling with pathways. J. Mach. Learn. Res., 24(240): 1–113. [Google Scholar]
Liu, H., Tam, D., Muqeeth, M., et al. (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv. Neural Inf. Process. Syst., 35: 1950–1965. [Google Scholar]
Zaken, E. B., Ravfogel, S., Goldberg, Y. (2021) BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language models. arXiv preprint arXiv:2106.10199. [Google Scholar]
Singhal, K., Azizi, S., Tu, T., et al. (2023) Large language models encode clinical knowledge. Nature, 620(7972): 172–180. [CrossRef] [PubMed] [Google Scholar]
Nori, H., King, N., McKinney, S. M., et al. (2023) Capabilities of GPT-4 on medical challenge problems. arXiv preprint arXiv:2303.13375. [Google Scholar]
Mon, E.P.P., Thu, Y. K., Yu, T. T., et al. (2021) SymSpell4Burmese: Symmetric delete spelling correction algorithm (SymSpell) for Burmese spelling checking. In: 2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing. IEEE, pp. 1–6. [Google Scholar]
Zhang, T., Kishore, V., Wu, F., et al. (2019) BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675. [Google Scholar]
Lin, C.Y. (2004) ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. [Google Scholar]
Touvron H, Martin L, Stone K, et al. Llama 2: Open Touvron, H., et al. (2023) LLaMA 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.