Publication Date

10-1-2025

Document Type

Article

Publication Title

Water Practice and Technology

Volume

20

Issue

10

DOI

10.2166/wpt.2025.124

First Page

2042

Last Page

2059

Abstract

Uncontrolled anthropogenic activities have increased river pollution, resulting in poor water quality and a growing need for continuous monitoring, which is time-consuming and costly. This study evaluated the accuracy of machine learning (ML) models in predicting dissolved oxygen (DO) levels, aiming to reduce field monitoring efforts and minimize the number of measured parameters. Regression models (Huber, Linear, Ridge), Bagging methods (Extra Trees, Random Forest, Decision Trees), and Boosting techniques (Gradient Boosting, Light Gradient Boosting, AdaBoost) were tested using data from three sites in the Manawatu Catchment, New Zealand. Data spanning 1989–2014 included 12 water quality variables and was split into 70% training and 30% testing sets. Model performance was assessed using Nash–Sutcliffe efficiency, root mean square error, and coefficient of determination (R2). Results showed that fewer water quality indicators, particularly temperature, pH, and nutrients, effectively predict DO. All three regression models offer the best value for predicting DO within the Manawatu catchment over Bagging and Boosting regressors with R2 > 0.95. This study presents a novel, data-driven approach to water quality monitoring, demonstrating that ML can accurately predict DO using a reduced set of indicators, making monitoring efforts more computationally efficient, labor-saving, and cost-effective for water resource managers.

Keywords

dissolved oxygen, environmental monitoring, feature selection, machine learning, prediction, water quality modeling

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Marketing and Business Analytics

Share

COinS