Performance evaluation of preprocessing techniques utilizing expert information in multivariate calibration

Talanta. 2014 Apr:121:105-12. doi: 10.1016/j.talanta.2013.12.053. Epub 2014 Jan 2.

Abstract

Partial Least Squares (PLS) regression is one of the most used methods for extracting chemical information from Near Infrared (NIR) spectroscopic measurements. The success of a PLS calibration relies largely on the representativeness of the calibration data set. This is not trivial, because not only the expected variation in the analyte of interest, but also the variation of other contributing factors (interferents) should be included in the calibration data. This also implies that changes in interferent concentrations not covered in the calibration step can deteriorate the prediction ability of the calibration model. Several researchers have suggested that PLS models can be robustified against changes in the interferent structure by incorporating expert knowledge in the preprocessing step with the aim to efficiently filter out the spectral influence of the spectral interferents. However, these methods have not yet been compared against each other. Therefore, in the present study, various preprocessing techniques exploiting expert knowledge were compared on two experimental data sets. In both data sets, the calibration and test set were designed to have a different interferent concentration range. The performance of these techniques was compared to that of preprocessing techniques which do not use any expert knowledge. Using expert knowledge was found to improve the prediction performance for both data sets. For data set-1, the prediction error improved nearly 32% when pure component spectra of the analyte and the interferents were used in the Extended Multiplicative Signal Correction framework. Similarly, for data set-2, nearly 63% improvement in the prediction error was observed when the interferent information was utilized in Spectral Interferent Subtraction preprocessing.

Keywords: Extended Multiplicative Signal Correction; External Parameter Orthogonalization; Generalized Least Squares Weighting; Glucose; Pure component spectrum; Spectral Interference Subtraction.