Abstract
Estimating real-time sources of pollutants and evaluating the effectiveness of conservation practices in agriculture are crucial for prevention of water resources contamination. Data driven Machine Learning (ML) models may be more effective to predict daily water pollutant loads and may not need many types of data such as soil, weather, crop management, etc., that are needed by currently used process-based models. In this study, daily hydro-meteorological data (e.g., temperature, rainfall, irrigation, runoff, etc.) were collected from paired control and treatment fields, each with an area of 7.81 ha, from 2016 to 2022, located near Manila, Arkansas, where cover crops and filter strips were used for pollution mitigation in the treatment field only. Pollutant loads, including sediment, total phosphorus (TP), and total nitrogen (TN), were measured daily and used to train and test nine ML models: Multiple Linear Regression (MLR), K-Nearest Neighbors (KNN), Random Forest (RF), Extreme Gradient Boosting (XGB), Histogram Gradient Boosting (HGB), Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and a hybrid CNN-LSTM model. Results showed that the hybrid model best-predicted runoff in the control field (Coefficient of determination (R2) = 0.87) and KNN in the treatment field ( R 2 = 0.82). LSTM excelled in sediment prediction for both fields, while RF and ANN were superior for TP and TN predictions, respectively. Model performance declined from runoff to sediment to nutrient loads due to error propagation. Advanced models (e.g., LSTM, CNN, and hybrid) outperformed conventional ML models, showing robustness against overfitting. Shapley Additive exPlanations (SHAP) analysis highlighted precipitation and runoff as key drivers of pollutant loads. Conservation practices significantly reduced pollutant loads, with the treatment field achieving reductions of 33 % (runoff), 46 % (sediment), 47 % (TP), and 44 % (TN) compared to the control field. These findings demonstrate the effectiveness of ML models in predicting the pollutant loads based on calibrated ML models and hydrometeorological data, which can be helpful for site-specific monitoring of water pollutants in the agricultural field. The results also highlight the tangible benefits of conservation practices in mitigating agricultural pollution.
| Original language | English |
|---|---|
| Article number | 103479 |
| Journal | Ecological Informatics |
| Volume | 92 |
| Issue number | Issue |
| DOIs | |
| State | Published - Dec 1 2025 |
Keywords
- Agricultural conservation practices
- Cover crops and filter strips
- Feature analysis
- Modeling
- SHAP analysis
- Water quality
Fingerprint
Dive into the research topics of 'Machine learning models for water quality: Predicting pollutant loads and assessing conservation practice's effectiveness in agricultural fields'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver