Current Biotechnology ›› 2023, Vol. 13 ›› Issue (4): 645-653.DOI: 10.19586/j.2095-2341.2022.0007
• Techniques and Methods • Previous Articles
Yunpeng MA(
), Jing ZHU(
), Xinghua CUI
Received:2022-01-17
Accepted:2022-03-09
Online:2023-07-25
Published:2023-08-03
Contact:
Jing ZHU
通讯作者:
朱静
作者简介:马云鹏 E-mail:1020264950@qq.com;
基金资助:CLC Number:
Yunpeng MA, Jing ZHU, Xinghua CUI. Content Estimating of Microbial Dissolved Organic Carbon Based on Machine Learning[J]. Current Biotechnology, 2023, 13(4): 645-653.
马云鹏, 朱静, 崔兴华. 基于机器学习的微生物溶解有机碳含量估测[J]. 生物技术进展, 2023, 13(4): 645-653.
| 样本 | 操作分类单元编号 | DOC含量/(mg·g-1) | ||||||
|---|---|---|---|---|---|---|---|---|
| OTU_401 | OTU_12 | OTU_960 | OTU_20 | OTU_11 | OTU_6 | OTU_25 | ||
| 样本1 | 0 | 38 | 1 | 17 | 0 | 27 | 0 | 10.46 |
| 样本2 | 140 | 28 | 0 | 0 | 0 | 3 | 0 | 7.00 |
| 样本3 | 262 | 8 | 109 | 11 | 0 | 67 | 18 | 5.00 |
| 样本4 | 102 | 9 | 0 | 1 | 0 | 2 | 0 | 8.42 |
| 样本5 | 2 | 26 | 6 | 73 | 0 | 0 | 0 | 9.46 |
Table 1 OTU partial sample table
| 样本 | 操作分类单元编号 | DOC含量/(mg·g-1) | ||||||
|---|---|---|---|---|---|---|---|---|
| OTU_401 | OTU_12 | OTU_960 | OTU_20 | OTU_11 | OTU_6 | OTU_25 | ||
| 样本1 | 0 | 38 | 1 | 17 | 0 | 27 | 0 | 10.46 |
| 样本2 | 140 | 28 | 0 | 0 | 0 | 3 | 0 | 7.00 |
| 样本3 | 262 | 8 | 109 | 11 | 0 | 67 | 18 | 5.00 |
| 样本4 | 102 | 9 | 0 | 1 | 0 | 2 | 0 | 8.42 |
| 样本5 | 2 | 26 | 6 | 73 | 0 | 0 | 0 | 9.46 |
| 算法 | RMSE | MAE | R2 |
|---|---|---|---|
| 套索回归 | 2.460 6 | 1.979 1 | 0.305 2 |
| 弹性网回归 | 2.305 8 | 1.804 4 | 0.390 5 |
| 支持向量机 | 2.389 3 | 1.894 2 | 0.342 1 |
| 决策树 | 2.779 2 | 2.131 4 | 0.065 0 |
| K近邻 | 2.748 6 | 2.097 0 | 0.146 7 |
| 多层感知机 | 2.549 0 | 2.116 5 | 0.262 2 |
| 极限树 | 1.995 5 | 1.551 8 | 0.539 9 |
| 极限梯度提升决策树 | 2.049 1 | 1.576 0 | 0.508 1 |
| 随机森林 | 1.979 1 | 1.515 4 | 0.544 5 |
| 自适应增强算法 | 2.073 4 | 1.620 5 | 0.503 1 |
| 引导聚集算法 | 2.103 5 | 1.622 9 | 0.477 2 |
| 梯度提升决策树 | 1.955 4 | 1.472 4 | 0.558 5 |
Table 2 Prediction results of multiple machine learning models
| 算法 | RMSE | MAE | R2 |
|---|---|---|---|
| 套索回归 | 2.460 6 | 1.979 1 | 0.305 2 |
| 弹性网回归 | 2.305 8 | 1.804 4 | 0.390 5 |
| 支持向量机 | 2.389 3 | 1.894 2 | 0.342 1 |
| 决策树 | 2.779 2 | 2.131 4 | 0.065 0 |
| K近邻 | 2.748 6 | 2.097 0 | 0.146 7 |
| 多层感知机 | 2.549 0 | 2.116 5 | 0.262 2 |
| 极限树 | 1.995 5 | 1.551 8 | 0.539 9 |
| 极限梯度提升决策树 | 2.049 1 | 1.576 0 | 0.508 1 |
| 随机森林 | 1.979 1 | 1.515 4 | 0.544 5 |
| 自适应增强算法 | 2.073 4 | 1.620 5 | 0.503 1 |
| 引导聚集算法 | 2.103 5 | 1.622 9 | 0.477 2 |
| 梯度提升决策树 | 1.955 4 | 1.472 4 | 0.558 5 |
| 操作分类单元编号 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| OTU_401 | OTU_12 | OTU_960 | OTU_20 | OTU_11 | OTU_6 | OTU_40 | OTU_53 | OTU_150 | OTU_55 |
| OTU_57 | OTU_202 | OTU_160 | OTU_574 | OTU_249 | OTU_95 | OTU_188 | OTU_221 | OTU_1469 | OTU_3824 |
| OTU_273 | OTU_389 | OTU_292 | OTU_636 | OTU_23 | OTU_101 | OTU_4022 | OTU_539 | OTU_61 | OTU_146 |
| OTU_181 | OTU_100 | OTU_120 | OTU_81 | OTU_262 | OTU_1259 | OTU_616 | OTU_5 | OTU_1019 | OTU_1032 |
| OTU_138 | OTU_16 | OTU_167 | OTU_170 | OTU_1914 | OTU_21 | OTU_22 | OTU_220 | OTU_227 | OTU_24 |
| OTU_267 | OTU_29 | OTU_309 | OTU_313 | OTU_329 | OTU_3858 | OTU_473 | OTU_474 | OTU_534 | OTU_597 |
| OTU_70 | OTU_82 | OTU_98 | OTU_1 | OTU_15 | OTU_8 | OTU_7 | OTU_13 | OTU_18 | OTU_26 |
| OTU_10 | OTU_9 | OTU_45 | OTU_1033 | OTU_44 | OTU_193 | OTU_35 | OTU_32 | OTU_27 | OTU_131 |
| OTU_28 | OTU_51 | OTU_84 | OTU_5179 | OTU_56 | OTU_54 | OTU_77 | OTU_75 | OTU_94 | OTU_1297 |
| OTU_1111 | OTU_1200 | OTU_1974 | OTU_103 | OTU_2139 | OTU_950 | OTU_106 | OTU_235 | OTU_251 | OTU_431 |
| OTU_358 | OTU_2512 | OTU_713 | OTU_3826 | OTU_179 | OTU_211 | OTU_1119 | OTU_1569 | OTU_201 | OTU_2586 |
| OTU_3002 | OTU_320 | OTU_953 | OTU_1509 | OTU_226 | OTU_347 | OTU_169 | OTU_470 | OTU_293 | OTU_5841 |
| OTU_363 | OTU_357 | OTU_407 | OTU_458 | OTU_372 | OTU_1052 | OTU_581 | OTU_652 | OTU_5988 | OTU_1550 |
| OTU_545 | OTU_698 | OTU_1348 | OTU_5531 | OTU_4794 | OTU_2669 | OTU_516 | OTU_994 | OTU_4277 | OTU_5059 |
Table 3 RFE-FIS (GBDT) feature selection OTU table
| 操作分类单元编号 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| OTU_401 | OTU_12 | OTU_960 | OTU_20 | OTU_11 | OTU_6 | OTU_40 | OTU_53 | OTU_150 | OTU_55 |
| OTU_57 | OTU_202 | OTU_160 | OTU_574 | OTU_249 | OTU_95 | OTU_188 | OTU_221 | OTU_1469 | OTU_3824 |
| OTU_273 | OTU_389 | OTU_292 | OTU_636 | OTU_23 | OTU_101 | OTU_4022 | OTU_539 | OTU_61 | OTU_146 |
| OTU_181 | OTU_100 | OTU_120 | OTU_81 | OTU_262 | OTU_1259 | OTU_616 | OTU_5 | OTU_1019 | OTU_1032 |
| OTU_138 | OTU_16 | OTU_167 | OTU_170 | OTU_1914 | OTU_21 | OTU_22 | OTU_220 | OTU_227 | OTU_24 |
| OTU_267 | OTU_29 | OTU_309 | OTU_313 | OTU_329 | OTU_3858 | OTU_473 | OTU_474 | OTU_534 | OTU_597 |
| OTU_70 | OTU_82 | OTU_98 | OTU_1 | OTU_15 | OTU_8 | OTU_7 | OTU_13 | OTU_18 | OTU_26 |
| OTU_10 | OTU_9 | OTU_45 | OTU_1033 | OTU_44 | OTU_193 | OTU_35 | OTU_32 | OTU_27 | OTU_131 |
| OTU_28 | OTU_51 | OTU_84 | OTU_5179 | OTU_56 | OTU_54 | OTU_77 | OTU_75 | OTU_94 | OTU_1297 |
| OTU_1111 | OTU_1200 | OTU_1974 | OTU_103 | OTU_2139 | OTU_950 | OTU_106 | OTU_235 | OTU_251 | OTU_431 |
| OTU_358 | OTU_2512 | OTU_713 | OTU_3826 | OTU_179 | OTU_211 | OTU_1119 | OTU_1569 | OTU_201 | OTU_2586 |
| OTU_3002 | OTU_320 | OTU_953 | OTU_1509 | OTU_226 | OTU_347 | OTU_169 | OTU_470 | OTU_293 | OTU_5841 |
| OTU_363 | OTU_357 | OTU_407 | OTU_458 | OTU_372 | OTU_1052 | OTU_581 | OTU_652 | OTU_5988 | OTU_1550 |
| OTU_545 | OTU_698 | OTU_1348 | OTU_5531 | OTU_4794 | OTU_2669 | OTU_516 | OTU_994 | OTU_4277 | OTU_5059 |
| 特征选择方法 | 算法 | RMSE | MAE | R2 |
|---|---|---|---|---|
| RFE(RF) | 梯度提升决策树 | 1.940 7 | 1.478 6 | 0.579 2 |
| 极限树 | 1.963 1 | 1.501 0 | 0.566 9 | |
| 随机森林 | 1.958 5 | 1.486 0 | 0.566 9 | |
| RFE(GBDT) | 梯度提升决策树 | 1.821 2 | 1.377 5 | 0.618 3 |
| 极限树 | 1.855 1 | 1.420 5 | 0.601 5 | |
| 随机森林 | 1.905 8 | 1.453 7 | 0.581 8 | |
| RFE(ET) | 梯度提升决策树 | 1.954 3 | 1.487 6 | 0.556 3 |
| 极限树 | 1.874 0 | 1.425 9 | 0.597 6 | |
| 随机森林 | 1.936 5 | 1.474 2 | 0.566 1 | |
| FIS(GBDT) | 梯度提升决策树 | 1.864 4 | 1.412 1 | 0.601 3 |
| 极限树 | 1.937 1 | 1.499 3 | 0.566 7 | |
| 随机森林 | 1.956 4 | 1.493 8 | 0.555 8 | |
| RFE-FIS(GBDT) | 梯度提升决策树 | 1.818 8 | 1.386 8 | 0.620 3 |
| 极限树 | 1.914 6 | 1.466 3 | 0.577 8 | |
| 随机森林 | 1.924 7 | 1.459 3 | 0.570 2 |
Table 4 Model prediction results
| 特征选择方法 | 算法 | RMSE | MAE | R2 |
|---|---|---|---|---|
| RFE(RF) | 梯度提升决策树 | 1.940 7 | 1.478 6 | 0.579 2 |
| 极限树 | 1.963 1 | 1.501 0 | 0.566 9 | |
| 随机森林 | 1.958 5 | 1.486 0 | 0.566 9 | |
| RFE(GBDT) | 梯度提升决策树 | 1.821 2 | 1.377 5 | 0.618 3 |
| 极限树 | 1.855 1 | 1.420 5 | 0.601 5 | |
| 随机森林 | 1.905 8 | 1.453 7 | 0.581 8 | |
| RFE(ET) | 梯度提升决策树 | 1.954 3 | 1.487 6 | 0.556 3 |
| 极限树 | 1.874 0 | 1.425 9 | 0.597 6 | |
| 随机森林 | 1.936 5 | 1.474 2 | 0.566 1 | |
| FIS(GBDT) | 梯度提升决策树 | 1.864 4 | 1.412 1 | 0.601 3 |
| 极限树 | 1.937 1 | 1.499 3 | 0.566 7 | |
| 随机森林 | 1.956 4 | 1.493 8 | 0.555 8 | |
| RFE-FIS(GBDT) | 梯度提升决策树 | 1.818 8 | 1.386 8 | 0.620 3 |
| 极限树 | 1.914 6 | 1.466 3 | 0.577 8 | |
| 随机森林 | 1.924 7 | 1.459 3 | 0.570 2 |
| 参数 | 搜索范围 | 搜索步长 |
|---|---|---|
| learning_rate | 0.01~0.2 | 0.01 |
| n_estimators | 100~1 000 | 1.00 |
| max_depth | 1~10 | 1.00 |
Table 5 Model parameter grid search range
| 参数 | 搜索范围 | 搜索步长 |
|---|---|---|
| learning_rate | 0.01~0.2 | 0.01 |
| n_estimators | 100~1 000 | 1.00 |
| max_depth | 1~10 | 1.00 |
| 模型状态 | RMSE | MAE | R2 |
|---|---|---|---|
| RFE-FIS(GBDT) | 1.818 8 | 1.386 8 | 0.620 3 |
| GS-RFE-FIS(GBDT) | 1.722 0 | 1.293 4 | 0.659 9 |
Table 6 Precision comparison after parameter optimization
| 模型状态 | RMSE | MAE | R2 |
|---|---|---|---|
| RFE-FIS(GBDT) | 1.818 8 | 1.386 8 | 0.620 3 |
| GS-RFE-FIS(GBDT) | 1.722 0 | 1.293 4 | 0.659 9 |
| 1 | LI H Z. Microbiome, metagenomics, and high-dimensional compositional data analysis[J]. Ann. Rev. Stat. Appl., 2015, 2: 73-94. |
| 2 | SARKER I H. Machine learning: Algorithms, real-world applications and research directions[J]. SN Comput. Sci., 2021, 2(3): 1-21. |
| 3 | HASAN B M S, ABDULAZEEZ A M. A review of principal component analysis algorithm for dimensionality reduction[J]. J. Soft Comput. Data Min., 2021, 2(1): 20-30. |
| 4 | STATNIKOV A, HENAFF M, NARENDRA V, et al.. A comprehensive evaluation of multicategory classification methods for microbiomic data[J]. Microbiome, 2013, 1(1): 1-12. |
| 5 | ZELLER G, TAP J, VOIGT A Y, et al.. Potential of fecal microbiota for early:tage detection of colorectal cancer[J/OL]. Mol. Syst. Biol., 2014, 10(11): 766[2022-05-06]. . |
| 6 | NING J, BEIKO R G. Phylogenetic approaches to microbial community classification[J]. Microbiome, 2015, 3(1): 1-13. |
| 7 | LO C, MARCULESCU R. MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks[J]. BMC Bioinform., 2019, 20(12): 1-14. |
| 8 | BOKULICH N A, DILLON M R, BOLYEN E, et al.. q2 -sample-classifier: machine-learning tools for microbiome classification and regression[J/OL]. J. Open Res. Softw., 2018, 3(30):934[2022-05-06]. . |
| 9 | 黄荣才,高胜涛,范士杰,等.畜禽粪污源抗生素及耐药基因在环境中的归趋[J].生物技术进展,2019,9(2):146-151. |
| 10 | 刘超,王宪伟,宋艳宇,等.增温对冻土区泥炭沼泽土壤孔隙水甲烷关联微生物和溶解性有机碳的影响[J].生态学报,2021,41(1):184-193. |
| 11 | 丁咸庆,柏菁,项文化,等.不同浸提剂处理森林土壤溶解性有机碳含量比较[J].土壤,2020,52(3):518-524. |
| 12 | 郭利娜,贾羽旋,李彤,等.森林溶解性有机碳淋溶驱动机制及模拟研究进展[J].生态学杂志,2020,39(5):1723-1733. |
| 13 | 余高, 陈芬, 谢英荷,等.有机肥替代化肥比例对黄壤土活性有机碳及酶活性的影响[J].中国蔬菜,2020(4):48-55. |
| 14 | LIANG C, SCHIMEL J P, JASTROW J D. The importance of anabolism in microbial control over soil carbon storage[J]. Nat. Microbiol., 2017, 2(8): 1-6. |
| 15 | ZITNIK M, NGUYEN F, WANG B, et al.. Machine learning for integrating data in biology and medicine: principles, practice, and opportunities[J]. Inf. Fusion, 2019, 50: 71-91. |
| 16 | JOHANSEN R, ALBRIGHT M, LOPEZ D, et al.. Microbial community-level features linked to divergent carbon flows during early litter decomposition in a constant environment[J/OL]. BioRxiv, 2019: 659383[2022-05-16]. . |
| 17 | AFENDRAS G, MARKATOU M. Optimality of training/test size and resampling effectiveness in cross-validation[J]. J. Stat. Plan. Infer., 2019, 199: 286-301. |
| 18 | THOMPSON J, JOHANSEN R, DUNBAR J, et al.. Machine learning to predict microbial community functions: an analysis of dissolved organic carbon from litter decomposition[J/OL]. PLoS ONE, 2019, 14(7): e0215502[2022-05-16]. . |
| 19 | SEONWOO M, BYUNGHAN L, SUNGROH Y. Deep learning in bioinformatics[J]. Brief Bioinform., 2017, 18(5): 851-869. |
| [1] | Xiaoqi WU, Wenjing GONG, Guoyu LI, Ang LI, Jihua WANG, Di CUI. Knowledge Gaps and Chanllenges in Microbial Fermentation of Traditional Chinese Medicine: From Strain Selection to Quality Control [J]. Current Biotechnology, 2025, 15(2): 201-211. |
| [2] | Huanzhen WU, Ye YANG, Xiuming CUI, Yuan LIU. The Current Status and Improvement Strategies of Agricultural Biological Control Technology [J]. Current Biotechnology, 2024, 14(5): 697-711. |
| [3] | Bicong WU, Bo JIAO, Yu ZHANG, Xin GUO, Yu ZHANG, Xiaohong LUO, Lei DAI, Qiang WANG. Effect of Feed-to-liquid Ratio on the Quality Characteristics of Stirred UHT Walnut Yogurt [J]. Current Biotechnology, 2024, 14(4): 640-648. |
| [4] | Chang XU, Tianyi LIU, Wenjia LIU, Limin ZHANG, Jixian MO. Research Progress in Source, Biosynthesis and Function of Microbial Exopolysaccharides [J]. Current Biotechnology, 2024, 14(3): 368-376. |
| [5] | Jiaqi SUN, Jia GUO, Chuang ZHANG, Qing LIU, Ziyu WANG, Hanchao XIA, Buxuan QIAN, Fangfang ZHAO, Qi WANG, Jianfeng LIU, Xiangguo LIU. Research Progress of Phosphite Dehydrogenase in Genetically Engineered Microorganisms and Plants [J]. Current Biotechnology, 2024, 14(2): 173-181. |
| [6] | Ting XU, Jiahao SHEN, Kang ZHAO, Lu HUANG, Enhui DONG, Kexin ZENG, Xinwei BIAN, Minghui JI, Qin XU. Bacterial Signature for Prediction of Disease Type Based on Abundance of Ruminococcus [J]. Current Biotechnology, 2024, 14(2): 323-330. |
| [7] | Haitao CAO, Jing ZHU, Yunpeng MA, Xinghua CUI. Application of Machine Learning in Phenotypic Prediction of Gut Microbiota [J]. Current Biotechnology, 2023, 13(5): 671-680. |
| [8] | Haitao CAO, Jing ZHU, Haibo ZENG, Yanchen LIU. Research on Feature Selection of Gut Microbiota and Disease Prediction Model Based on Weighted Average [J]. Current Biotechnology, 2023, 13(5): 798-806. |
| [9] | Siyuan QIU, Jingxue XU, Yuyang DUAN, Jinyu ZHAO, Wenjing ZHAO, Lixin ZHANG, Guoling REN. Research Progress on Production and Application of Mannosylerythritol Lipids [J]. Current Biotechnology, 2023, 13(2): 210-219. |
| [10] | Ruiju MIAO, Zundan DING, Jian TIAN, Hongbing ZHANG, Feifei GUAN. Research Advances on Traditional and Intelligent Molecular Design of PET Hydrolases [J]. Current Biotechnology, 2023, 13(1): 46-54. |
| [11] | Jie HAO, Qiang JI, Liqun LI, Chao ZHENG, Na WU, Han WU, Xuanwen LI, Zhikang SUN. Research Progress on Improving Aroma of Tobacco Leaf with Bio-enzymes and Microorganism Technology [J]. Current Biotechnology, 2022, 12(6): 817-824. |
| [12] | Peimin LIU, Jinping LUO, Quanxin GAO. Research Progress of Environmental Microorganisms in Aquaculture [J]. Current Biotechnology, 2022, 12(5): 690-695. |
| [13] | Zhiqi XIN, Hang ZHAO, Hai WANG, Tiegang LU. Crop Genomics and Genetic Improvement Based on Deep Learning [J]. Current Biotechnology, 2021, 11(4): 483-488. |
| [14] | XU Huanhuan, ZHANG Hongbing*, LI Huixuan, LI Lei. Application Progress of Atmospheric and Room Temperature Plasma Technology in Microbial Mutagenesis [J]. Curr. Biotech., 2020, 10(4): 358-362. |
| [15] | LI Yubang1, WU Junlin1,2*, LI Mansha1. Progress on Microbial Fermentation of Medicinal and Edible Plants [J]. Curr. Biotech., 2019, 9(5): 461-466. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||