Issue |
BIO Web Conf.
Volume 174, 2025
2025 7th International Conference on Biotechnology and Biomedicine (ICBB 2025)
|
|
---|---|---|
Article Number | 03016 | |
Number of page(s) | 8 | |
Section | Technologies and Methodologies in Biomedical Research | |
DOI | https://doi.org/10.1051/bioconf/202517403016 | |
Published online | 12 May 2025 |
ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
1 School of Biological Science and Medical Engineering, Southeast University, Nanjing 211189, China
2 Xinge Yuan Biotechnology Co., Ltd., Nanjing 211189, China
* Corresponding author: Fan Jue, fanjue@singleronbio.com; Xiao Sun, xsun@seu.edu.cn
High-throughput sequencing advancements have shifted genomic project bottlenecks from data generation to computational storage and analysis. Single-cell RNA-seq (scRNA-seq) data exhibits unique structural features, including extensive labeled sequence identifiers, which conventional compression tools fail to optimize. This study proposes ScBlkCom, a specialized compression scheme for scRNA-seq data. The method partitions sequencing data into distinct blocks and applies tailored compression strategies: differential encoding for numerical attributes, Huffman coding for categorical labels, and context-adaptive encoding for sequence identifiers. Experiments demonstrate ScBlkCom achieves 84.29% higher compression gain compared to single-module approaches and outperforms generic tools (e.g., GZIP, BZIP2) by 6.44% in compression ratio, while maintaining stable processing speeds. This block-wise adaptive framework effectively addresses scRNA-seq data redundancy, offering enhanced storage efficiency for large-scale single-cell studies.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.