I am concerned with all the aspects of genomic data analysis. However, most frequently, data
originates from either microarray or quantitative PCR experiments. My domain of predilection
remains statistical pattern recognition and its applications to life sciences.
Current research projects and themes
MAQC-II
MAQC-II is an FDA-led initiative for the standardization of methodology for biomarker identification
studies. There is a lack of accepted standards for biomarker validation, for biological interpretation
of results and for demonstrating comparability of conclusions. The initiative compares methods for
selection and validation of biomarkers from microarray data, paying particular attention to
robustness, flexibility and reproducibility of the classification system.
Besides the contribution to the mainstream effort of the project, by designing and implementing
a data analysis plan compliant with FDA's requirements, I focussed also on more specific issues
like the study of the effect of classification problem complexity/difficulty on the optimal combination
of feature selection and classification methods.
Selection of control genes
We propose a meta-analysis approach to selecting candidate control genes. This has the
advantage of being platform- and normalization-independent and of being able to integrate
predefined list of genes as well. The first step is to score the genes from a dataset and to
rank them accordingly. Here is a plot showing the scores (color-coded) from a dataset:
R code for scoring and aggregating the gene
ranks from several datasets is available here.
Segmentation of tiling array data
Segmenting the tiling array data is a challenging task due to high level of noise
that affects the measurements. We introduce a wavelet–based denoising step in the
process of segmentation and we prove its efficiency on simulated and real–world data.
This denoising step has the advantage of improving the accuracy of the segmentation
while also reducing the execution time and memory requirements.
Here is an example of such segmentation of yeast's 1st chromosome:
Tumor scoring using qPCR/microarray analysis
This is a long term project whose goal is to design one or several molecular
signatures with prognostic value in breast cancer survival and treatment
prediction.
Breast cancer data analysis
I am involved in a number of projects concerned with analysis of the breast cancer
microarray data. One of these projects is MAQC-II, a US project aimed at validating
classifiers built on microarray predictors.