Department of Biochemistry and Molecular Biology | Sealy Center for Structural Biology | Computational Biology |
![]() Search SDAP SDAP Tools About SDAP Our Software Tools Allergen Databases Protein Databases Protein Classification Bioinformatics Servers |
SDAP 2.0 - Structural Database of Allergenic Proteins
| |||||||||||||||||||
AllergenAI (New!) : A deep learning model predicting allergenicity based on protein sequence
AllergenAI overview Innovations in protein engineering can help redesign allergenic proteins to reduce adverse reactions in sensitive individuals. To accomplish this aim, a better knowledge of the molecular properties of allergenic proteins and the molecular features that make a protein allergenic is needed. We present a novel AI-based tool, AllergenAI, to quantify the allergenic potential of a given protein. Our approach is solely based on protein sequences, differentiating it from previous tools that use some knowledge of the allergens' physicochemical and other properties in addition to sequence homology. We used the collected data on protein sequences of allergenic proteins as archived in the three well-established databases, SDAP 2.0, COMPARE, and AlgPred 2, to train a convolutional neural network and assessed its prediction performance by cross-validation. We then used Allergen AI to find novel potential proteins of the cupin family in date palm, spinach, maize, and red clover plants with a high allergenicity score that might have an adverse allergenic effect on sensitive individuals. By analyzing the feature importance scores (FIS) of vicilins, we identified a proline-alanine-rich (P-A) motif in the top 50% of FIS regions that overlapped with known IgE epitope regions of vicilin allergens. Furthermore, using~ 1600 allergen structures in our SDAP database, we showed the potential to incorporate 3D information in a CNN model. Future, incorporating 3D information in training data should enhance the accuracy. AllergenAI is a novel foundation for identifying the critical features that distinguish allergenic proteins. If SDAP is used in publications, please cite:
More recent publications related to SDAP, please cite:
We use the IUIS nomenclature and allergens in SDAP as the official set of allergens, as these allergens have been reviewed by a committee of experts in the field. However, we also included other proteins in SDAP when they were listed in an allergen data base at the time of establishing SDAP. We keep these non-IUIS allergens in SDAP (clearly marked as non-IUIS) as a service for the allergen researchers who are exploring and studying proteins that might have a potential allergenic response. In most cases these proteins can be searched in literature data bases for additional information. The SDAP project is supported by grants from the National Institute of Health (2R56AI064913), the U.S. Environmental Protection Agency under a STAR Research Assistance Agreement (No. RD 834823), the National Institute of Health (1RO1AI165866-01), and by the Margaret Maccallum Gage and Tracy Davis Gage foundation. Access to SDAP is available free of charge for Academic and non-profit use. Licenses for commercial use can be obtained by contacting W. Braun (webraun@utmb.edu). Recent SDAP developments:
Previous version of SDAP was developed by Dr. Ovidiu Ivanciuc at The University of Texas Medical Branch, Galvestion, TX. Current version of SDAP is developed by Dr. Surendra Negi at the The University of Texas Medical Branch, Galvestion,TX. |