projects

T cell - epitope interactions

T cells play a critical role in our body’s defence against viruses, bacteria, and even cancer. T cells can recognize molecular fragments presented by infected or cancerous cells (the so-called epitopes) and initiate a response to eliminate them. Here is a video showing a T cell targeting and killing a cancer cell.

Unfortunately, determining which T cells can target which epitopes remains challenging, both from computational and experimental point of views. To address this problem, I developed MixTCRpred, a state-of-the-art deep learning model that takes the T cell and the epitope sequence as inputs and predicts the likelihood of their interaction. Accurate predictions are needed to speed up the development of treatments that aim at identifying or engineering T cells to target cancer epitopes. Have a look at the paper and at the code on Github.

Modeling protein evolution

Over the course of evolution, genetic mutations accumulate in the DNA sequences that code for proteins. These mutations can lead to changes in protein amino acid sequences and affect their structure and function. Is it possible to model protein evolution, i.e. to anticipate which amino acid mutations will appear in the future?

SARS-CoV-2

From the beginning of the COVID-19 pandemics, mutations accumulating on SARS-CoV-2 proteins have lead to the emergence of highly infectious and immune-evading SARS-CoV-2 variants, a major concern for public health. We proposed a statistical model trained on sequence data from coronaviruses circulating before the pandemic began to identify protein positions in SARS-CoV-2 proteins are more likely to undergo mutations. Through a retrospective analysis, we proved that our model could accurately predict polymorphic residues that have mutated from the very first known strain of SARS-CoV-2 (Wuhan-Hu-1) during the past years of COVID-19 pandemic, with increased accuracy as more data becomes available. We integrated our predictions with immunological data to pinpoint mutations that are expected to induce immune escape and thus to be over represented in current and future SARS-CoV-2 variants of concern. Here some slides about this project. For an introduction to the problem and to our approach, see the articles on RFI (in english) or France Culture (in french).

Escherichia coli

The idea of using evolutionary sequence data to predict protein evolution can be extended to other cases beyond the SARS-CoV-2 virus. In another project we used the proteomes of 60,000 recently diverged Escherichia coli strains to develop the theoretical framework. See the introductory article on Nature Portfolio.