| Issue |
BIO Web Conf.
Volume 195, 2025
2025 9th International Conference on Biomedical Engineering and Bioinformatics (ICBEB 2025)
|
|
|---|---|---|
| Article Number | 03001 | |
| Number of page(s) | 12 | |
| Section | Biomedical Data Analysis and Epidemiological Studies | |
| DOI | https://doi.org/10.1051/bioconf/202519503001 | |
| Published online | 14 November 2025 | |
DiaData: An Integrated Large Dataset for Type 1 Diabetes and Hypoglycemia Research
Professorship of Data Engineering, Helmut Schmidt University, Hamburg, Germany
* e-mail: cinarb@hsu-hh.de
** e-mail: maleshkm@hsu-hh.de
Type 1 diabetes (T1D) is an incurable autoimmune disorder, which needs attentive monitoring to avoid high glucose variations. Affected cannot produce sufficient insulin and depend on external insulin injections. Multiple factors impact glucose levels, which can lead to dangerous side effects of hyperglycemia (≥ 180 mg/dL) and hypoglycemia (≤ 70 mg/dL). Data analysis can significantly enhance diabetes care by discovering individual trends and enabling tailored decision support. Particularly, machine learning (ML) approaches provide early alerts and predict glucose levels. However, the main limitation in diabetes research is the unavailability of large datasets. Therefore, this study systematically integrates 15 datasets to create a comprehensive database of 2510 subjects with glucose measurements recorded every 5 minutes. In total, 149 million measurements are included (Euglycemia (58.3%), Hyperglycemia (37.5%), and Hypoglycemia (4.2%)). Moreover, two sub-databases are extracted, including demographics or heart-rate data. The integrated dataset provides an equal distribution of sex and a variety of age levels. As a further contribution, data quality is assessed, revealing that missing values and data imbalance present a significant challenge. Thus, the application of ML models necessitates appropriate preprocessing methods.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

