Stacking machine learning model for estimating hourly PM2.5 in China based on Himawari 8 aerosol optical depth data

Jiangping Chen; Jianhua Yin; Lin Zang; Taixin Zhang; Mengdi Zhao

doi:10.1016/j.scitotenv.2019.134021

Stacking machine learning model for estimating hourly PM_2.5 in China based on Himawari 8 aerosol optical depth data

Sci Total Environ. 2019 Dec 20:697:134021. doi: 10.1016/j.scitotenv.2019.134021. Epub 2019 Aug 22.

Authors

Jiangping Chen¹, Jianhua Yin², Lin Zang³, Taixin Zhang¹, Mengdi Zhao¹

Affiliations

¹ School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China.
² School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China. Electronic address: yinjianhua@whu.edu.cn.
³ Chinese Antarctic Center of Surveying and Mapping, Wuhan University, Wuhan 430079, China.

PMID: 31484095
DOI: 10.1016/j.scitotenv.2019.134021

Abstract

Aerosol optical depth (AOD) from polar orbit satellites and meteorological factors have been widely used to estimate concentrations of surface particulate matter with an aerodynamic diameter <2.5 μm (PM_2.5). However, estimations with high temporal resolution remain lacking because of the limitations of satellite observations. Here, we used AOD data with a temporal resolution of 1 h provided by a geostationary satellite called Himawari 8 to overcome this problem. We developed a stacking model, which contained three submodels of machine learning, namely, AdaBoost, XGBoost and random forest, stacked through a multiple linear regression model. Then, we estimated the hourly concentrations of PM_2.5 in Central and Eastern China. The accuracy evaluation showed that the proposed stacking model performed better than the single models when applied to the test set, with an average coefficient of determination (R²) of 0.85 and a root-mean-square error (RMSE) of 17.3 μg/m³. Model precision reached its peak at 14:00 (local time), with an R² (RMSE) of 0.92 (12.9 μg/m³). In addition, the spatial and temporal distributions of PM_2.5 in Central and Eastern China were plotted in this study. The North China Plain was determined to be the most polluted area in China, with an annual mean PM_2.5 concentration of 58 μg/m³ during daytime. Moreover, the pollution level of PM_2.5 was the highest in winter, with an average concentration of 73 μg/m³.

Keywords: Air pollution; Himawari 8; Hourly PM(2.5); Stacking model.