Research
Overview
Research at CBIO is dedicated to the development of models and algorithms to analyze and understand biological and chemical data. We emphasize the use of probabilistic models and statistical machine learning. Our long-term objective is, through multiple collaborations, to contribute to the development of new therapies, in particular against cancer.
An increasing number of new technologies enable the study of living organisms on a hitherto unexplored scale. For example, next-generation sequencing can be used to read the complete genetic information of a biological sample, mass spectrometry characterizes proteins expressed in a tissue, high-resolution imaging tracks changes in cell cultures, and high-throughput screening characterizes the biological activity of a large number of molecules. In addition, electronic health records contain large amount of text data, images, or biological time series that describe the dynamics of patient diagnoses and response to treatment.
These technologies all generate huge amounts of raw data, which are often difficult to comprehend directly. To exploit these massive amounts of data more effectively, and notably to extract from them relevant biological and medical information for predictive and precision medicine, our team is developing mathematical methods and innovative algorithms. To achieve this goal, our team combines extensive expertise in mathematical modelling, statistics, machine learning, bioimage informatics, bioinformatics, and a fine understanding of the underlying biological processes, in the fields of structural biology, genetics and cancerology.
We are developing new tools and methods to examine specific questions of medical or biological interest, notably:
- In silico basic and systems biology: We develop innovative approaches to tackle various biological aspects of living systems. Our work aims notably at reconstructing biological networks from omics data, model tumor progression at the genomic, transcriptomic and epigenetic level, automatically annotate new proteins and functional elements through integration of complex and heterogeneous data, including data obtained from high-throughput sequencing, time-lapse video-microscopy or Genome Wide Association Studies.
- Towards predictive and precision medicine: We develop tools to classify tumors and identify biomarkers for diagnosis, prognosis, and prediction of drug response. These classifications are based on large amounts of data including clinical data, somatic mutations, gene and alternative transcript expression, epigenetic state, or structural DNA modification, and involve high-dimensional statistical machine learning techniques.
- Drug design: we develop new virtual screening and chemoinformatics methods. This can help identify new molecules likely to inhibit specific therapeutic targets and to lead to novel drug candidates. We make use of sequence-based, graph-based, and 3D representation of proteins and their ligands, and develop in silico chemogenomic approaches to analyze jointly the chemical space of small molecules and biological space of protein targets, leading in particular to the prediction of secondary targets, efficacy profiles, and adverse effects.
Ongoing funded projects
- STEVE: Advancing genotype to phenotype Studies by considering Transposable Elements Variability and Epivariability (ANR 2021 – 2025).
- SYSBIO-CF, fondation MSD-Avenir 2023-2026.
- Modélisation du fonctionnement du tissus respiratoire sain et dans le cadre d'applications thérapeutiques (La Fondatiàon Dassault Systèmes 2022-2024)
- PrAIrie Chair in Bioimage Informatics / Computer Vision (ANR 3IA, 2019 –)
- PrAIrie Chair in Computational Biology (ANR 3IA, 2019 –)
- MLFPM: Machine Learning at the Frontiers of Precision Medicine (H2020 Innovative Training Network, 2019 – 2023).
- SCAPHE: Methods for discovering SNP Combinations Associated with a PHEnotype from genomewide data. (ANR 2019 – 2022).
Past funded projects
- New drugs for Cystic Fibrosis based on machine-learning (Fondation Maladies Rares 2020-2021)
- Machine learning for genome-wide association studies (collaboration SANOFI, 2016-2019).
- CRESTNETMETABO: New challenges in the regulatory network of neural crest early development (ANR 2015-2019).
- Hi-FISH: Systematic study of gene expression at the RNA single molecule level (ANR 2014-2018).
- ABS4NGS: Algorithms, bioinformatics and softwares for next-generation sequencing (ANR "Investments for the future" program, 2012-2016).
- SMAC: Statistical machine learning for complex biological data (ERC 2012-2017).
- MLPM: Machine Learning for Personalised Medicine. (EC-FP7 Innovative Training Network, 2012-2016).
- RADIANT: Rapid development and distribution of statistical tools for high-throughput sequencing (EC-FP7 2012-2015).
- Systems Microscopy Network of Excellence (EC-FP7 2013-2015)
- TYRO3: TYRO3, a new therapeutic target for cancer (INCA 2012-2014).
- CRESTNET: Building regulatory networks in neural crest induction: integrative approaches in vivo and in stem cells (ANR 2012-2014).
- AP'ONCALYPSE: Validation of an immune signature predicting a therapeutic response to anthracyclines in breast cancer (ANR 2012-2013)
- Structured machine learning for microbiology: mass spectrometry and high-throughput sequencing (Collaboration with Biomerieux, 2011-2014).
- Integrated analysis of methylation profiles in breast cancers (Ligue contre le cancer, 2011-2014):
- [NADINE: Nanosystems for early diagnosis of neurodegenerative diseases] (EC-FP7 2010-2015)
- [CLARA: clustering in high dimension, algorithms and applications] (ANR 2009-2013).
- Development of algorithms and databases in cancer informatics (JSPS 2008-2010)
- MGA : Graphical models and applications (ANR 2007-2011)
- RAMIS : High-resolution microscopy for screening of anti-cancer drugs (2007-2011)
- ParTox : Monitoring the toxicity of nanoparticles (ANR 2007-2009)
- Inference and learning in dynamic graphical models, with applications in speech and bio-informatics (France-Berkeley fund, 2007-2009):
- Biotype : caracterization of prostate tumors by multiple technologies (MEDICEN 2007-2009)
- Machine learning for virtual screening (Carnot 2007-2009) :
- DSIR : algorithms for design of siRNA (Ligue contre le cancer, 2006-2007)
- Indigo : integrated highly sensitive fluorescence-based biosensors for diagnosis applications (EC-FP7 2005-2008)
- ESBIC-D : a European systems biology infrastructure for combating complex diseases (EC-FP7 2005-2007)
- NIH : detecting genomic relations among heterogeneous genomic datasets (NIH 2004-2007)
- Kernelchip : integration of gene expression data and gene regulatory networks for the study of cancerous tumors (CNRS 2004-2007)
- GemBio : analysis of anti-malaria drug effects on P. falciparum (Mines 2004-2007)
- BioClassif : statistical learning theory for structured and high-dimensional data (CNRS 2004-2006)
- Machine learning for virtual screening (ANVAR 2004-2006)
- iBioinfo : development of methods and bioinformatics tools to analyze cell chip data (CEA 2005-2006)
- Sakura : statistical and combinatorial analysis of biological networks (JSPS 2003-2005):
- Statistical learning for the analysis of transcriptome (CNRS 2003-2004)