Issue |
BIO Web Conf.
Volume 174, 2025
2025 7th International Conference on Biotechnology and Biomedicine (ICBB 2025)
|
|
---|---|---|
Article Number | 03001 | |
Number of page(s) | 4 | |
Section | Technologies and Methodologies in Biomedical Research | |
DOI | https://doi.org/10.1051/bioconf/202517403001 | |
Published online | 12 May 2025 |
A Review on Optimal Subsampling Method
The High School Affiliated to Renmin University of China, Beijing, 100080, China
* Corresponding author’s e-mail: 2860581245@qq.com
Optimal subsampling method is an efficient method for massive data because it can not only downsize the data amount but also save computational time. Subsampling methods have been essential to statistical analysis throughout history. In this article, we discuss several prominent subsampling methods, including subsampling based on leverage, A/L-optimality criterion, D- optimality criterion and Poisson subsampling strategy. For linear models, we find that the leverage is simple to apply. Subsampling based on A/L-optimality serves as a general approach applicable to numerous models, such as generalized linear models along with linear models. It is only in the case of linear models that subsampling founded on D-optimality proves to be more effective in comparison to other methods. When contrasted with subsampling with replacement, poisson sampling emerges as a more efficient subsampling technique, demanding less memory and taking up less processing time.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.