Using Random Forest, a machine learning approach to predict nitrogen, phosphorus, and sediment event mean concentrations in urban runoff

J Environ Manage. 2022 Sep 1:317:115412. doi: 10.1016/j.jenvman.2022.115412. Epub 2022 May 29.

Abstract

Estimating pollutant loads from developed watersheds is vitally important to reduce nonpoint source pollution from urban areas, as a key tool in meeting water quality goals is the implementation of Stormwater Control Measures (SCMs). SCMs are selected and sized based on influent pollutant loads. A common method used to estimate pollutant loads in urban runoff is the Event Mean Concentration (EMC) method. In this study, we develop and apply data-driven models using Random Forest (RF), a machine learning approach, to predict Total Nitrogen (TN), Total Phosphorus (TP), Total Suspended Solids (TSS), and Ortho-Phosphorus (Ortho-P) EMCs in urban runoff. The parameters considered in this study were climatological characteristics (i.e., Antecedent Dry Period or ADP, Precipitation Depth or P, Duration or D, and Intensity or I) and catchment characteristics including land use-related parameters including Imperviousness or Imp, Saturated Hydraulic Conductivity or Ksat, and Available Water Capacity or AWC), and site-specific parameters including Slope (S), and Catchment Size (A). Stormwater quality data for this study were obtained from the National Stormwater Quality Database (NSQD), which is the largest repository of stormwater quality data in the U.S. Results demonstrate that land use-related characteristics (i.e., Imp, Ksat, and AWC) were the most effective variables for predicting all EMCs. For TP, TSS, and Ortho-P, site-specific characteristics (S and A) had a greater effect than climatological characteristics (i.e., ADP, P, D, and I). However, for TN, climatological characteristics had a greater effect than site-specific characteristics (S and A). In addition, for TN, TP, and TSS, precipitation characteristics (P, D, and I) were found to be more effective parameters for estimating EMCs than ADP. This study highlights the most influential parameters affecting EMCs which can be used by stakeholders and SCMs designers to improve estimates of nutrients and sediment EMCs. The selection and design of the highest performing SCMs is essential in achieving effective treatment of stormwater, attaining water quality goals, and protecting downstream waterbodies.

Keywords: Data-driven models; Decision trees; Land use; National stormwater quality database (NSQD); Stormwater quality prediction; Variable importance.

MeSH terms

  • Environmental Monitoring / methods
  • Environmental Pollutants*
  • Machine Learning
  • Nitrogen / analysis
  • Phosphorus / analysis
  • Rain
  • Water Movements
  • Water Pollutants, Chemical* / analysis

Substances

  • Environmental Pollutants
  • Nitrogen
  • Phosphorus
  • Water Pollutants, Chemical