Access the full text.
Sign up today, get DeepDyve free for 14 days.
B. Al-Lazikani, A. Lesk, C. Chothia (1997)
Standard conformations for the canonical structures of immunoglobulins.Journal of molecular biology, 273 4
Djork-Arné Clevert, Thomas Unterthiner, S. Hochreiter (2015)
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)arXiv: Learning
P. Olimpieri, Anna Chailyan, A. Tramontano, P. Marcatili (2013)
Prediction of site-specific interactions in antibody-antigen complexes: the proABC method and serverBioinformatics, 29
Xavier Glorot, Yoshua Bengio (2010)
Understanding the difficulty of training deep feedforward neural networks
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin (2017)
Attention is All you Need
J. Meiler, Michael Müller, Anita Zeidler, Felix Schmäschke (2001)
Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networksMolecular modeling annual, 7
K. Krawczyk, T. Baker, Jiye Shi, C. Deane (2013)
Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking.Protein engineering, design & selection : PEDS, 26 10
Nal Kalchbrenner, Lasse Espeholt, K. Simonyan, Aäron Oord, Alex Graves, K. Kavukcuoglu (2016)
Neural Machine Translation in Linear TimeArXiv, abs/1610.10099
S. Hochreiter, J. Schmidhuber (1997)
Long Short-Term MemoryNeural Computation, 9
Kaiming He, X. Zhang, Shaoqing Ren, Jian Sun (2015)
Deep Residual Learning for Image Recognition2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2019)
Attentive Cross-Modal Paratope Prediction
Sergey Ioffe, Christian Szegedy (2015)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate ShiftArXiv, abs/1502.03167
Diederik Kingma, Jimmy Ba (2014)
Adam: A Method for Stochastic OptimizationCoRR, abs/1412.6980
(2018)
URL https:// openreview.net/forum?id=rJXMpikCZ
Nitish Srivastava, Geoffrey Hinton, A. Krizhevsky, Ilya Sutskever, R. Salakhutdinov (2014)
Dropout: a simple way to prevent neural networks from overfittingJ. Mach. Learn. Res., 15
J. Dunbar, K. Krawczyk, Jinwoo Leem, T. Baker, A. Fuchs, G. Georges, Jiye Shi, C. Deane (2013)
SAbDab: the structural antibody databaseNucleic Acids Research, 42
Edgar Liberis, Petar Velickovic, Pietro Sormanni, M. Vendruscolo, P. Lio’ (2018)
Parapred: antibody paratope prediction using convolutional and recurrent neural networksBioinformatics, 34 17
1 1 2 Andreea Deac Petar Velick ˇ ovic ´ Pietro Sormanni Abstract requiring vast amounts of information. Predictors such as Antibody i-Patch (Krawczyk et al., 2013) use as input the Antibodies are a critical part of the immune sys- full structural information of the antibody and the antigen, tem, having the function of directly neutralising while proABC (Olimpieri et al., 2013) requires the entire or tagging undesirable objects (the antigens) for antibody sequence and additional features including the future destruction. Being able to predict which antigen volume. amino acids belong to the paratope, the region Only recently, Parapred (Liberis et al., 2018)—a hybrid ar- on the antibody which binds to the antigen, can chitecture consisting of convolutional and recurrent layers— facilitate antibody design and contribute to the has become the state of the art technique. However, its usage development of personalised medicine. The suit- of recurrent layers represents a performance bottleneck, and ability of deep neural networks has recently been it discards the information about the target antigen entirely. confirmed for this task, with Parapred outperform- ing all prior models. Our contribution is twofold: In this work, we outperform Parapred by addressing its lim- first, we significantly outperform the computa- itations and leveraging the bleeding-edge techniques in the tional efficiency of Parapred by leveraging a ` trous language modelling community, such as a ` trous convolu- convolutions and self-attention. Secondly, we im- tions (Kalchbrenner et al., 2016) and self-attention (Vaswani plement cross-modal attention by allowing the et al., 2017), while also significantly lowering computa- antibody residues to attend over antigen residues. tion time. We then manage to further improve this result This leads to new state-of-the-art results on this by cross-modally attending over sequential antigen infor- task, along with insightful interpretations. mation, managing to derive qualitative insights from the attentional coefficients in the process. 1. Introduction 2. Dataset and Preprocessing Antibodies are Y-shaped proteins used by the immune sys- We used a subset of the Structural Antibody Database tem to neutralise pathogens such as bacteria and viruses. (SAbDab) (Dunbar et al., 2014), which provides crystal This is done when the antibody binds to the unique structures of antibody-antigen complexes, in order to train molecules on the pathogen called antigens. With antibod- and evaluate our models. The subset was chosen under the ies being the most important class of biopharmaceuticals, same criteria as in Liberis et al. (2018): 1. Antibodies having knowing which amino acids are needed for the binding is variable domains of their heavy (V ) and light (V ) chains; H L a type of information that can have a significant impact 2. Structure resolution better than 3A; 3. No two antibody on applications in diagnostics and therapeutics. In particu- sequences have >95% sequence identity; 4. Each antibody lar, creating novel antibodies requires the optimisation of has at least five amino acid residues in contact with the properties such as solubility and stability, for which the target antigen. non-binding amino acids can be used while maintaining the The paratope is contained within the complementarity de- same binding affinity. termining regions (CDRs) of the antibody. We identify Traditional attempts for predicting the binding amino acids the CDRs within the sequence of each antibody using the (the paratope) were based on hard coded physical models, Chothia numbering scheme (Al-Lazikani et al., 1997), and use each CDR as an independent training sequence. Department of Computer Science and Technology, Uni- versity of Cambridge, UK Department of Chemistry, Univer- For each residue in the CDR, we use the following features sity of Cambridge, UK. Correspondence to: Andreea Deac to obtain its feature vector, ab : <aid25@cam.ac.uk>. i th Throughout this document, we will frequently use the terms Proceedings of the 35 International Conference on Machine “amino acid” and “residue” interchangeably. Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). arXiv:1806.04398v1 [stat.ML] 12 Jun 2018 Attentive cross-modal paratope prediction ij ~ ~ α × ~0 + DilatedConv Attention Dense p ~ ab b b LeakyReLU(e + e ) i j 3× Figure 1. The Fast-Parapred architecture. a a a F +1 a 2F 1 a a F +2 2 F ... ... b b b b b b i,1 i,2 i,F j,1 j,2 j,F A one-hot encoding of the amino acid type (20 possible ~ ~ types + 1 additional for an unknown type); residue bi residue bj A one-hot encoding of the chain ID of the CDR (6 Figure 2. The attentional mechanism a within Fast-Parapred. possible types—three on the heavy chain (H1, H2, H3) ~0 and three on the light chain (L1, L2, L3). Consequen- j b tially, all residues within the same CDR will receive the same encoding; Seven additional features, summarised by Meiler et al. × × × × × × (2001), representing physical, chemical and structural properties of the given amino acid type (may be seen α α α 1,j j,j n,j α α α 1,j j,j n,j as a fixed embedding). a a a a a a In addition, for all of the complexes in the dataset, the antigens were proteins. This allowed us to also extract the 1D residue sequences on the antigen. There is no equivalent . . . . . . ~ . . . ~ . . . ~ ~g b ~g j ~g b j b 1 n 1 n for CDRs on the antigen, so the entire antigen sequence is extracted, and each residue’s feature vector ag ~ is obtained exactly as for ab ; omitting only the one-hot encoding of the chain ID, as the antigens aren’t expected to have a fixed Figure 3. Left: The self-attentional layer of Fast-Parapred. Right: chain structure. The length of the longest CDR sequence is The cross-modal attentional layer of AG-Fast-Parapred. 32 residues, while for the longest antigen it is 1269 residues. Therefore, the shape of the largest possible CDR-antigen Then, a self-attention mechanism is applied on the input to our model is ([32, 34], [1269, 28]). computed intermediate features, b. 3. Methods Lastly, a pointwise fully-connected (dense) layer is applied to classify each considered antibody amino 3.1. Antibody-only acid as binding or non-binding. We build up on the developments of Parapred by substituting its recurrent layers with a combination of a ` trous convolu- All a ` trous convolutional layers and the self-attention layer tional layers (for efficient modelling of longer-range depen- employ the exponential linear unit (ELU) (Clevert et al., dencies) and a self-attentional layer (allowing for efficiently 2015) activation function, while the prediction layer uses covering the sequence). We will refer to this architecture as the logistic sigmoid function to perform binary classifica- Fast-Parapred for the remainder of this paper. tion. All the layers are initialised using Xavier initialisation (Glorot & Bengio, 2010). A high-level layout of the architecture is presented in Figure 1. It receives as input the vector of antibody residue features The leveraged self-attention mechanism (depicted in Figure ab, and consists of: ˇ ´ 2) is the same as the one utilised by Velickovic et al. (2018). Taking a set of intermediate antibody residue features b = ~ ~ ~ fb ; b ; :::b g, a shared neural network is applied to all pairs 1 2 n A stack of three a ` trous (dilated) convolutional layers: of residues, producing attention coefficients: 1. 64 features, kernel size 3, dilation rate 1; ~ ~ e = a(b ; b ) (1) ij i j 2. 128 features, kernel size 3, dilation rate 2; 3. 256 features, kernel size 3, dilation rate 4. indicating the importance of residue j’s features to residue Attentive cross-modal paratope prediction i. Here the neural network a is a single-layer feedforward neural network, parametrised by a weight vector ~a, and ~ ~ α × ~0 + DilatedConv Attention Dense p ~ applying the LeakyReLU nonlinearity (with negative input ab b b slope = 0:2): 3× ag ~ DilatedConv ~g ~ ~ e = LeakyReLU ~a [Wb kWb ] (2) ij i j 3× where represents transposition andk is the concatenation Figure 4. The AG-Fast-Parapred architecture. operation. Here, W is a shared, learnable linear transforma- tion of the residue features (preserving their dimensionality at 256)—adding further expressivity to the layer. We will refer to this architecture (presented in Figure 4) as AG-Fast-Parapred for the remainder of this document. Once computed, the attention coefficients are normalised using the softmax function, for easy comparability across We will focus on describing our cross-modal attentional different residues: layer here, as the other layers are defined exactly the same as in Fast-Parapred (with identical hyperparameters). exp(e ) ij = P (3) The input to the layer is a set of antibody residue fea- ij n exp(e ) ik ~ ~ ~ k=1 tures b = fb ; b ; :::b g, a set of antigen residue features 1 2 M g = f~g ;~g ; :::~g g and for each antibody residue b a set 1 2 N i i Lastly, using the normalised attention coefficients, we com- which marks the antigen residues which are in a fixed-range pute a linear combination of all antibody residues’ features neighbourhood from b . This neighbourhood was chosen to for each attending antibody residue: restrict the number of antigen residues being attended over 0 1 n by any antibody residue to 150. The attentional coefficients ~ @ ~ A are then computed using the same attention mechanism a b = Wb (4) ij j j=1 as before, using the antibody residues as the queries and antigen features as keys and values. In addition, we now which represents the final output of the layer (summarised apply two learned linear transformations (parametrised by by Figure 3 (left)). W and W ) to each residue in b and g, respectively. The 1 2 attentional coefficients are subsequently normalised using a The regularisation methods used in this architecture are: softmax activation, fully expanded out as follows: L -regularisation (with = 0:01); exp LeakyReLU ~a [W b kW ~g ] 1 i 2 j Dropout (Srivastava et al., 2014) (with p = 0:5 on the = ij T ~ exp LeakyReLU ~a [W b kW ~g ] final layer and p = 0:15 on all the other ones); 1 i 2 k k2 (5) Batch normalisation (Ioffe & Szegedy, 2015) on the Using the normalised attention coefficients, we then com- output of each layer; pute a linear combination of the corresponding antigen residues in the neighbourhood, for each attending antibody: A skip connection (He et al., 2015) over self-attention, to preserve positional information of the residues. 0 1 ~ @ A b = W ~g (6) ij 2 j The model (as well as all subsequent models) is trained j2 using the Adam SGD optimiser—with base learning rate of 0.01 and other hyperparameters as presented in Kingma & conveniently summarised by Figure 3 (right). Ba (2014)—for 20 epochs with a batch size of 32. The result is, in a similar way to the Antibody-only method, 3.2. Antibody-Antigen passed through a pointwise convolutional layer and a logistic sigmoid non-linearity is applied, in order to classify each With similar motivation as before, we extract features from considered antibody amino acid residue as binding or non- antibody and antigen amino acid residues by applying, in- binding. dependently to both, a stack of three a ` trous convolutional layers (with exactly the same hyperparameters as for the We apply the same regularisation as for the antibody-only antibody-only model). The self-attention in the antibody- model—along with a skip connection over the cross-modal only paratope predictor is then replaced with cross-modal attention (which is in this case critical, as the layer entirely attention of the antibody over the antigen residue features. discards antibody features). Attentive cross-modal paratope prediction Table 1. Comparative evaluation results with highlighted 95% confidence intervals, after ten runs of 10-fold crossvalidation. Method ROC AUC MCC Epoch time proABC (Olimpieri et al., 2013) 0:851 0:522 — Parapred (Liberis et al., 2018) 0:880 0:002 0:564 0:007 0:190 0:019s Fast-Parapred (ours) 0:883 0:001 0:572 0:004 0:085 0:015s AG-Fast-Parapred (ours) 0:899 0:004 0:598 0:012 0:178 0:020s 4. Results 4.1. Quantitative Results We perform ten runs of 10-fold crossvalidation (with 10 distinct splits of the data into 10 folds) on Parapred, Fast- 1.0 Parapred Parapred and AG-Fast-Parapred. For each, we monitor ROC- Fast-Parapred 0.9 AUC, Matthews correlation coefficient (which we also re- AG-Fast-Parapred port for proABC; Table 1), the wall-clock time it takes to Antibody i-Patch 0.8 perform one epoch of training, and the precision/recall curve 0.7 (which we also report for Antibody i-Patch; Figure 5), along with 95% confidence intervals. Our results successfully 0.6 demonstrate that: 0.5 0.4 Fast-Parapred has achieved the state-of-the-art-level result on antibody-only paratope prediction, while re- 0.3 quiring only half the computational time of Parapred; 0.0 0.2 0.4 0.6 0.8 1.0 Recall AG-Fast-Parapred has significantly outperformed this Figure 5. Precision-Recall curves with highlighted 95% confidence result, for the first time successfully leveraging anti- intervals, after ten runs of 10-fold crossvalidation. gen information in a deep paratope predictor, while relying solely on convolutional and attentional layers, removing the dependency on recurrent layers entirely. It should be noted that AG-Fast-Parapred still improves on the epoch time of Parapred, despite working with input sizes that are up to 40 larger. 4.2. Qualitative Results We visualise, using PyMOL, the computed binding prob- abilities of AG-Fast-Parapred, on a test antibody-antigen complex, in Figure 6 (left)—revealing that its neural net- work has learnt to appropriately infer positional information (predicting higher probabilities for the residues closer to the antigen), without being given any 3D coordinates. The attentional coefficients computed by AG-Fast-Parapred are Figure 6. Best viewed in colour. For a test antibody-antigen com- also visualised, for a single antibody residue, in Figure 6 plex: Left: Antibody residue binding probabilities to the antigen (right). From these we may observe that the attentional (in gold) assigned by AG-Fast-Parapred. Right: Normalised anti- mechanism will tend to assign larger importances to antigen gen attention weights for a single (binding) antibody residue (in residues that are closer to this antibody residue—indicating red). Warmer colours indicate higher probabilities/coefficients. the usefulness of the cross-modal attentional mechanism, and potentially hinting at a joint method for predicting anti- gen binding sites (epitopes), which we leave for future work. Precision Attentive cross-modal paratope prediction References dimension-reduced amino acid parameter representations by artificial neural networks. Molecular modeling annual, Al-Lazikani, Bissan, Lesk, Arthur M, and Chothia, Cyrus. 7(9):360–369, 2001. Standard conformations for the canonical structures of immunoglobulins. Journal of molecular biology, 273(4): Olimpieri, Pier Paolo, Chailyan, Anna, Tramontano, Anna, 927–948, 1997. and Marcatili, Paolo. Prediction of site-specific interac- tions in antibody-antigen complexes: the proabc method Clevert, Djork-Arne, ´ Unterthiner, Thomas, and Hochreiter, and server. Bioinformatics, 29(18):2285–2291, 2013. doi: Sepp. Fast and accurate deep network learning by expo- 10.1093/bioinformatics/btt369. URL +http://dx. nential linear units (elus). CoRR, abs/1511.07289, 2015. doi.org/10.1093/bioinformatics/btt369. URL http://arxiv.org/abs/1511.07289. Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Dunbar, James, Krawczyk, Konrad, Leem, Jinwoo, Baker, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: Terry, Fuchs, Angelika, Georges, Guy, Shi, Jiye, and A simple way to prevent neural networks from overfit- Deane, Charlotte M. Sabdab: the structural anti- ting. Journal of Machine Learning Research, 15:1929– body database. Nucleic Acids Research, 42(D1):D1140– 1958, 2014. URL http://jmlr.org/papers/ D1146, 2014. doi: 10.1093/nar/gkt1043. URL +http: v15/srivastava14a.html. //dx.doi.org/10.1093/nar/gkt1043. Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Glorot, Xavier and Bengio, Yoshua. Understanding the dif- Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Lukasz, ficulty of training deep feedforward neural networks. In and Polosukhin, Illia. Attention is all you need. CoRR, Proceedings of the thirteenth international conference on abs/1706.03762, 2017. URL http://arxiv.org/ artificial intelligence and statistics, pp. 249–256, 2010. abs/1706.03762. He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, ˇ ´ Velickovic, Petar, Cucurull, Guillem, Casanova, Aran- Jian. Deep residual learning for image recognition. CoRR, txa, Romero, Adriana, Lio, ` Pietro, and Bengio, Yoshua. abs/1512.03385, 2015. URL http://arxiv.org/ Graph Attention Networks. International Conference abs/1512.03385. on Learning Representations, 2018. URL https:// openreview.net/forum?id=rJXMpikCZ. Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing inter- nal covariate shift. CoRR, abs/1502.03167, 2015. URL http://arxiv.org/abs/1502.03167. Kalchbrenner, Nal, Espeholt, Lasse, Simonyan, Karen, van den Oord, Aaron, Graves, Alex, and Kavukcuoglu, Koray. Neural machine translation in linear time. CoRR, abs/1610.10099, 2016. URL http://arxiv.org/ abs/1610.10099. Kingma, Diederik P and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, Krawczyk, Konrad, Baker, Terry, Shi, Jiye, and Deane, Char- lotte M. Antibody i-patch prediction of the antibody bind- ing site improves rigid local antibodyantigen docking. Protein Engineering, Design and Selection, 26(10):621– 629, 2013. doi: 10.1093/protein/gzt043. URL +http: //dx.doi.org/10.1093/protein/gzt043. Liberis, Edgar, Velick ˇ ovic, ´ Petar, Sormanni, Pietro, Vendr- uscolo, Michele, and Lio, Pietro. Parapred: Antibody paratope prediction using convolutional and recurrent neural networks. Bioinformatics, 1:7, 2018. Meiler, Jens, Muller ¨ , Michael, Zeidler, Anita, and Schmaschk ¨ e, Felix. Generation and evaluation of
Statistics – arXiv (Cornell University)
Published: Jun 12, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.