Chemometric exploration in hyperspectral imaging in the framework of big data and multimodality
Nowadays, it is widely known that hyperspectral imaging is a very good tool used in many chemical-related research areas. Indeed, it can be exploited for the study of samples of different nature, whatever the spectroscopic technique used. Despite the very interesting characteristics related to this kind of acquired data, various limitations are potentially faced. First of all, modern instruments can generate a huge amount of data (big datasets). Furthermore, the fusion of different spectroscopic responses on the same sample (multimodality) can be potentially applied, leading to even more data to be analyzed. This aspect can be a problem, considering the fact that if the right approach is not used, it could be complicated to obtain satisfying results or even lead to a biased vision of the analytical reality of the sample. Obviously, some spectral artifacts can be present in a dataset, and so the correction of these imperfections has to be taken into account to carry out good outcomes. Another important challenge related to the use of hyperspectral image analysis is that normally, the simultaneous observation of spectral and spatial information is almost impossible. Clearly, this leads to an incomplete investigation of the sample of interest. Chemometrics is a modern branch of chemistry that can perfectly match the current limitations related to hyperspectral imaging. The purpose of this PhD work is to give to the reader a series of different topics in which many challenges related to hyperspectral images can be overcome using different chemometric facets. Particularly, as it will be described, problems such as the generation of big amount of data can be faced using algorithms based on the selection of the purest information (i.e., SIMPLISMA), or related to the creation of clusters in which similar components will be grouped (i.e., KM clustering). In order to correct instrumental artifacts such as saturated signals will be used a methodology that exploits the statistical imputation, in order to recreate in a very elegant way the missing information and thus, obtain signals that otherwise would be irremediably lost. A significant part of this thesis has been related to the investigation of data acquired using LIBS imaging, a spectroscopic technique that is currently obtaining an increasing interest in many research areas, but that, still, has not really been exploited to its full potential by the use of chemometric approaches. In this manuscript, it will be shown a general pipeline focusing on the selection of the most important information related to this kind of data cube (due to the huge amount of spectral data that can be easily generated) in order to overcome some limitations faced during the analysis of this instrumental response. Furthermore, the same approach will be exploited for the data fusion analysis related to LIBS and other spectroscopic data. Lastly, it will be shown an interesting way to use wavelet transform, in order to not limit the analysis only to spectral data, but also to spatial ones, to obtain a more complete chemical investigation.