publications
2024
- Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cellsGiancarlo Croce, Sara Bobisse, Dana Léa Moreno, and 4 more authorsNature Communications, 2024
T cells have the ability to eliminate infected and cancer cells and play an essential role in cancer immunotherapy. T-cell activation is elicited by the binding of the T-cell receptor (TCR) to epitopes displayed on MHC molecules, and the TCR specificity is determined by the sequence of its α and β chains. Here, we collected and curated a dataset of 17,715 αβTCRs interacting with dozens of class I and class II epitopes. We used this curated data to develop MixTCRpred, a deep learning TCR-epitope interaction predictor. MixTCRpred accurately predicts TCRs recognizing several viral and cancer epitopes. MixTCRpred further provides a useful quality control tool for multiplexed single-cell TCR sequencing assays of epitope-specific T cells and pinpoints a substantial fraction of putative contaminants in public databases. Analysis of epitope-specific dual α T cells demonstrates that MixTCRpred can identify α chains mediating epitope recognition. Applying MixTCRpred to TCR repertoires from COVID-19 patients reveals enrichment of clonotypes predicted to bind an immunodominant SARS-CoV-2 epitope. Overall, MixTCRpred provides a robust tool to predict TCRs interacting with specific epitopes and interpret TCR-sequencing data from both bulk and epitope-specific T cells.
2023
- Improved predictions of antigen presentation and TCR recognition with MixMHCpred2. 2 and PRIME2. 0 reveal potent SARS-CoV-2 CD8+ T-cell epitopesDavid Gfeller, Julien Schmidt, Giancarlo Croce, and 7 more authorsCell Systems, 2023
The recognition of pathogen or cancer-specific epitopes by CD8+ T cells is crucial for the clearance of infections and the response to cancer immunotherapy. This process requires epitopes to be presented on class I human leukocyte antigen (HLA-I) molecules and recognized by the T-cell receptor (TCR). Machine learning models capturing these two aspects of immune recognition are key to improve epitope predictions. Here, we assembled a high-quality dataset of naturally presented HLA-I ligands and experimentally verified neo-epitopes. We then integrated these data in a refined computational framework to predict antigen presentation (MixMHCpred2.2) and TCR recognition (PRIME2.0). The depth of our training data and the algorithmic developments resulted in improved predictions of HLA-I ligands and neo-epitopes. Prospectively applying our tools to SARS-CoV-2 proteins revealed several epitopes. TCR sequencing identified a monoclonal response in effector/memory CD8+ T cells against one of these epitopes and cross-reactivity with the homologous peptides from other coronaviruses.
- Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopesJulien Racle, Philippe Guillaume, Julien Schmidt, and 8 more authorsImmunity, 2023
CD4+ T cells orchestrate the adaptive immune response against pathogens and cancer by recognizing epitopes presented on class II major histocompatibility complex (MHC-II) molecules. The high polymorphism of MHC-II genes represents an important hurdle toward accurate prediction and identification of CD4+ T cell epitopes. Here we collected and curated a dataset of 627,013 unique MHC-II ligands identified by mass spectrometry. This enabled us to precisely determine the binding motifs of 88 MHC-II alleles across humans, mice, cattle, and chickens. Analysis of these binding specificities combined with X-ray crystallography refined our understanding of the molecular determinants of MHC-II motifs and revealed a widespread reverse-binding mode in HLA-DP ligands. We then developed a machine-learning framework to accurately predict binding specificities and ligands of any MHC-II allele. This tool improves and expands predictions of CD4+ T cell epitopes and enables us to discover viral and bacterial epitopes following the aforementioned reverse-binding mode.
2022
- Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopesJuan Rodriguez-Rivas*, Giancarlo Croce*, Maureen Muscat, and 1 more authorProceedings of the National Academy of Sciences, 2022
During the COVID pandemic, new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants emerge and spread, some being of major concern due to their increased infectivity or capacity to reduce vaccine efficiency. Anticipating mutations, which might give rise to new variants, would be of great interest. We construct sequence models predicting how mutable SARS-CoV-2 positions are, using a single SARS-CoV-2 sequence and databases of other coronaviruses. Predictions are tested against available mutagenesis data and the observed variability of SARS-CoV-2 proteins. Interestingly, predictions agree increasingly with observations, as more SARS-CoV-2 sequences become available. Combining predictions with immunological data, we find an overrepresentation of mutations in current variants of concern. The approach may become relevant for potential outbreaks of future viral diseases.
- Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapesLucile Vigué*, Giancarlo Croce*, Marie Petitjean, and 3 more authorsNature Communications, 2022
Characterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term. We use computational approaches to accurately predict the polymorphisms segregating in a panel of 61,157 Escherichia coli genomes from the analysis of distant homologues. By comparing a context-aware Direct-Coupling Analysis modelling to a non-epistatic approach, we show that the genetic context strongly constrains the tolerable amino acids in 30% to 50% of amino-acid sites. The study of more distant species suggests the gradual build-up of genetic context over long evolutionary timescales by the accumulation of small epistatic contributions.
- Both APRIL and antibody-fragment-based CAR T cells for myeloma induce BCMA downmodulation by trogocytosis and internalizationNicolas Camviel, Benita Wolf, Giancarlo Croce, and 3 more authorsJournal for Immunotherapy of Cancer, 2022
2020
- FilterDCA: interpretable supervised contact prediction using inter-domain coevolutionMaureen Muscat, Giancarlo Croce, Edoardo Sarti, and 1 more authorPLoS computational biology, 2020
Predicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce FilterDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, improving performance over standard coevolutionary analysis, while remaining fully transparent and interpretable. The FilterDCA code is available at http://gitlab.lcqb.upmc.fr/muscat/FilterDCA.
2019
- A multi-scale coevolutionary approach to predict interactions between protein domainsGiancarlo Croce, Thomas Gueudré, Maria Virginia Ruiz Cuevas, and 4 more authorsPLoS computational biology, 2019
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30–50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
- Adaptive cluster expansion for Ising spin modelsSimona Cocco*, Giancarlo Croce*, and Francesco Zamponi*The European Physical Journal B, 2019
We propose an algorithm to obtain numerically approximate solutions of the direct Ising problem, that is, to compute the free energy and the equilibrium observables of spin systems with arbitrary two-spin interactions. To this purpose we use the Adaptive Cluster Expansion method [S. Cocco, R. Monasson, Phys. Rev. Lett. 106, 090601 (2011)], originally developed to solve the inverse Ising problem, that is, to infer the interactions from the equilibrium correlations. The method consists in iteratively constructing and selecting clusters of spins, computing their contributions to the free energy and discarding clusters whose contribution is lower than a fixed threshold. The properties of the cluster expansion and its performance are studied in detail on one dimensional, two dimensional, random and fully connected graphs with homogeneous or heterogeneous fields and couplings. We discuss the differences between different representations (Boolean and Ising) of the spin variables.