METABOLOMICS 4 Data Handling
To process and interpret the complex data obtained within metabolomic-based studies, advanced software algorithms of data handling are needed, consisting of data processing, data pretreatment and data analysis.
Data processing proceeds through multiple stages such as filtering, peak detection, deconvolution, alignment and normalization. The need of powerful data-processing methods gave rise to numerous commercial as well as free tools implementing one or several steps of the data processing pipeline.
Data pretreatment represents another crucial step that can dramatically change the outcome of the data analysis. This procedure typically involves centering and scaling of the original data to eliminate unwanted systematic bias, while maintaining genuine differences in the examined datasets.
Data analysis involves the use of various chemometric tools. Unsupervised pattern recognition techniques (represented mainly by principal component analysis) are often the first step of the data analysis in order to detect patterns in the measured data. On the other hand, supervised pattern recognition techniques (e.g. partial least-squares discriminant analysis, linear discriminant analysis) use the existing information about the class membership of samples to a given group (class or category) to classify a new “unknown” sample using its pattern of measurement. From this point of view, the outputs of metabolomic data analysis may differ depending on the purpose of investigation.