New immunoinformatics tools On the sequence-side by Dr Anton Feenstra presented that is a new solution called SeRenDIP-CE, a random forest method for predicting epitopes from sequence
This method is based on deriving a range of features from the antigen amino-acid sequence (172 features across a sliding window of 9 amino acids, incorporating MSA-derived information) and training a random forest using those features to predict epitopes on a set of antigens collected from our very own Structural Antibody Database (SAbDab).
Interestingly, SeRenDIP-CE made use of a transfer learning approach of sorts, by combining in their train set data both from antibody-antigen interactions and from general hetero-dimer protein-protein interactions (PPIs) used for their previous SeRenDIP model (I say transfer learning of sorts, because the model is trained once on the full combined dataset, rather than updating the model with the more specific antigen dataset after training on the hetero-dimer dataset). The authors reported this training procedure to achieve considerably better results than training on either just the antigen dataset or just the hetero-dimer dataset.
This has some interesting implications for general development of machine learning methods for antibody-antigen interactions, as it implies that, despite the dissimilarity of binding modes in antibody-antigen interactions compared to general PPIs, antibody machine learning methods can benefit from dataset augmentation from such PPI datasets.