Comparing linear and non-linear data-driven approaches in monthly river flow prediction, based on the models SARIMA, LSSVM, ANFIS, and GMDH

Environ Sci Pollut Res Int. 2022 Mar;29(15):21935-21954. doi: 10.1007/s11356-021-17443-0. Epub 2021 Nov 13.

Abstract

River flow variations directly affect the hydro-climatological, environmental, and ecological characteristics of a region. Therefore, an accurate prediction of river flow can critically be important for water managers and planners. The present study aims to compare different data-driven models in predicting monthly flow. Two river catchments located in the Guilan province in Iran, where rivers play an essential role in agricultural productions (mainly rice), are studied. The monthly river flow dataset was provided by Guilan Regional Water Authority during 1986-2015. The models are derived from two different numerical types of stochastic and machine learning (ML) models. The stochastic model is seasonal autoregressive integrated moving average (SARIMA), and the MLs are least square support vector machine (LSSVM), adaptive neuro-fuzzy inference system (ANFIS), and group method of data handling (GMDH). The inputs were selected by autocorrelation and partial autocorrelation functions (ACF and PACF) from the flow rates of the previous months. The data was divided into 75% of training and 25% of testing phases, and then the mentioned models were implemented. Predictions were evaluated by the criteria of root mean square error (RMSE), normalized RMSE (NRMSE), and Nash Sutcliff (NS) coefficient. According to the calculated values of different criteria during the test phase, RMSE = 1.138 cms, NRMSE = 0.109, and NS = 0.826, it can be concluded that the SARIMA model was superior to its ML competitors. Among the ML models, GMDH had the best performance (by RMSE = 1.290 cms, NRMSE = 0.124, and NS = 0.777) because it has more optimization parameters and sample space for network make-up. The models were also evaluated in hydrological drought conditions of both rivers. It was resulted that the rivers' flow can be well predicted in drought conditions by using these models, especially the SARIMA stochastic model. According to the NRMSE values (ranged between 0.1 and 0.2), the accuracy of predictions is evaluated in the appropriate range, and the present study shows promising results of the current approaches. Consequently, a comparison between the performance of linear stochastic models and complex black-box MLs, reveals that linear stochastic models are more suitable for the current region's monthly river flow prediction.

Keywords: Group method of data handling; Hydrological simulation; River flow variations; Stochastic models; Support vector machine.

MeSH terms

  • Hydrology
  • Least-Squares Analysis
  • Linear Models
  • Rivers*
  • Support Vector Machine*