Stacking machine learning model for estimating hourly PM2.5 in China based on Himawari 8 aerosol optical depth data

Sci Total Environ. 2019 Dec 20:697:134021. doi: 10.1016/j.scitotenv.2019.134021. Epub 2019 Aug 22.

Abstract

Aerosol optical depth (AOD) from polar orbit satellites and meteorological factors have been widely used to estimate concentrations of surface particulate matter with an aerodynamic diameter <2.5 μm (PM2.5). However, estimations with high temporal resolution remain lacking because of the limitations of satellite observations. Here, we used AOD data with a temporal resolution of 1 h provided by a geostationary satellite called Himawari 8 to overcome this problem. We developed a stacking model, which contained three submodels of machine learning, namely, AdaBoost, XGBoost and random forest, stacked through a multiple linear regression model. Then, we estimated the hourly concentrations of PM2.5 in Central and Eastern China. The accuracy evaluation showed that the proposed stacking model performed better than the single models when applied to the test set, with an average coefficient of determination (R2) of 0.85 and a root-mean-square error (RMSE) of 17.3 μg/m3. Model precision reached its peak at 14:00 (local time), with an R2 (RMSE) of 0.92 (12.9 μg/m3). In addition, the spatial and temporal distributions of PM2.5 in Central and Eastern China were plotted in this study. The North China Plain was determined to be the most polluted area in China, with an annual mean PM2.5 concentration of 58 μg/m3 during daytime. Moreover, the pollution level of PM2.5 was the highest in winter, with an average concentration of 73 μg/m3.

Keywords: Air pollution; Himawari 8; Hourly PM(2.5); Stacking model.