The microbial communities has an important impact on the macro nature of the environment. However, the characteristics of high-dimensional, complex and sparse microbial data also pose new challenges for understanding the relationship between microorganisms and ecological environment. The development of machine learning and the popularization of the application of the second generation DNA sequencing technology provided a new solution to this problem. In this study, soil microbiome and dissolved organic carbon (DOC) data of 308 samples from plant litter decomposition experiments for 44 days were used, and 1 709 operational taxonomic units (OTU) of bacteria and microorganisms were used as features to build 12 commonly used machine learning models. Embedding method, packaging method and embedd-packaging fusion method were used for feature selection, and gradient boosting decision tree (GBDT) was selected as the optimal model for parameter optimization. The model adopted root mean square error, mean absolute error and linear goodness of fit was used as evaluation indexes. The results showed that, the feature selection reduced the data dimension and improved the model accuracy. In the simulation experiment, the embedding-packaging fusion method performs was the best in the application model. The prediction model of dissolved organic carbon was constructed by combining the embedding and packaging fusion method with gradient boosting decision tree, and the validity of the model was verified by experiments. The results provided a new way to estimate dissolved organic carbon using machine learning method based on bacterial and microbial data.