Towards a robust healthcare prediction model using an adaptive multimodal fusion based on hierarchical transformers

Lamiae Hachad; Tarik Hachad

doi:10.1051/bioconf/202520001022

Open Access

Issue		BIO Web Conf. Volume 200, 2025 Biology, Health & Artificial Intelligence Conference (BHAI 2025)


Article Number		01022
Number of page(s)		4
DOI		https://doi.org/10.1051/bioconf/202520001022
Published online		05 December 2025

BIO Web of Conferences 200, 01022 (2025)

Towards a robust healthcare prediction model using an adaptive multimodal fusion based on hierarchical transformers

Lamiae Hachad and Tarik Hachad

Computer Science Research Laboratory(LRI), Faculty of Sciences, Ibn Tofail University, Morocco

Abstract

The Healthcare field has applied multimodal learning that combines diverse data types to improve the quality of predictions in clinical settings in terms of precision. The limitations of such multimodal approaches are highly related to heterogeneous structure, modality relevance, noise, and data size. In this paper, we present an adaptive multimodal fusion model that assigns distinct importance weights to modalities via attention-based pooling within a hierarchical transformer architecture. Indeed, the proposed model derives features for each modality independently and then aggregates cross-modal features using a hierarchical attention mechanism. We evaluate our architecture in a controlled setting across multiple simulations, and we demonstrate that our model performs effectively across four data types for predicting health. Besides, our experiments revealed clear improvements in validation AUC and F1 scores, especially when the data is limited or noisy, which supports the strength and reliability of our hierarchical transformer-based fusion approach.

Key words: Multimodal Fusion / Hierarchical Transformers / Healthcare Prediction / Generalization and Robustness / Adaptive Attention Mechanisms

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.