Issue |
BIO Web Conf.
Volume 55, 2022
5th International Conference on Frontiers of Biological Sciences and Engineering (FBSE 2022)
|
|
---|---|---|
Article Number | 01017 | |
Number of page(s) | 5 | |
DOI | https://doi.org/10.1051/bioconf/20225501017 | |
Published online | 21 November 2022 |
A large-scale prediction of protein-protein interactions based on random forest and matrix of sequence
1 University College London, Institute of Child Health, London, WC1E 6AE, The United Kingdom
2 Institute of Intelligent Machine, Hefei Institutes of Physical Science, Chinese Academy of Sciences, HeFei 230031, China
* Corresponding author: 181543681@qq.com
Protein-protein interaction (PPIs) is an important part of many life activities in organisms, and the prediction of protein-protein interactions is closely related to protein function, disease occurrence, and disease treatment. In order to optimize the prediction performance of protein interactions, here a RT-MOS model was constructed based on Random Forest (RF) and Matrix of Sequence (MOS) to predict protein-protein interactions. Firstly, MOS is used to encode the protein sequences into a 29-dimensional feature vector; Then, a prediction model RT-MOS is build based on random forest, and the RT-MOS model is optimized and evaluated using the test set; Finally, the optimized model RT-MOS is used for prediction. The experimental results show that the accuracy rates of the RT-MOS model on the benchmark dataset and the non-redundant dataset are 97.18% and 91.34%, respectively, and the accuracies on four external datasets of C.elegans, Drosophila, E.coli and H.sapiens are 96.21%, 97.86%, 97.54% and 97.75%, respectively. Compared with the existing methods, it is found that it is superior to the existing methods. The experimental results show that the model RT-MOS has the advantages of saving time, preventing overfitting and high accuracy, and is suitable for large-scale PPIs prediction.
Key words: Random forest / Matrix of Sequence / Protein-protein interaction
© The Authors, published by EDP Sciences, 2022
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.