The University of Texas Medical Branch
Department of Biochemistry and Molecular Biology Sealy Center for Structural Biology Computational Biology


SDAP Home Page
SDAP Overview

Search SDAP
SDAP All
SDAP Food

SDAP Tools
AllergenAI
FAO/WHO Allergenicity Test
FASTA Search in SDAP
Peptide Match
Peptide Similarity
Peptide-Protein PD Index
Aller_ML, Allergen Markup Language
List SDAP

About SDAP
General Information
Manual
FAQ
Publications
Who Are We
Advisory Board
New Allergen Submission form

Allergy Links

Our Software Tools
MPACK
FANTOM
GETAREA
InterProSurf
EpiSearch

Allergen Databases
WHO/IUIS Allergen Nomenclature database
FARRP Allergen Protein Database (University of Nebraska)
Allergen Database for Food Safety (ADFS)
COMPARE database
ALLFAM (Medical University of Vienna)
Allermatch (Wageninen University)
Allergome Database

Protein Databases
PDB
MMDB - Entrez
SWISS-PROT
NCBI - Entrez
PIR

Protein Classification
CATH
FSSP
iProClass
ProtoMap
SCOP
VAST

Bioinformatics Servers
BLAST @ NCBI
FASTA @ EMBL-EBI
Peptide Match @ PIR
ClustalW @ EMBL - EBI


         SDAP 2.0 - Structural Database of Allergenic Proteins
Go to: SDAP All allergens       Go to: SDAP Food allergens
Send a comment to Werner Braun      Submit new allergen information to SDAP
  
Alphabetical listing of allergens: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Access to SDAP is available free of charge for Academic and non-profit use. Licenses for commercial use can be obtained by contacting W. Braun (webraun@utmb.edu).Secure access to SDAP is available from https://fermi.utmb.edu/SDAP

AllergenAI : A deep learning model predicting allergenicity based on protein sequence
AllergenAI overview
  • Innovations in protein engineering can help redesign allergenic proteins to reduce adverse reactions in sensitive individuals. To accomplish this aim, a better knowledge of the molecular properties of allergenic proteins and the molecular features that make a protein allergenic is needed. We present a novel AI-based tool, AllergenAI, to quantify the allergenic potential of a given protein. Our approach is solely based on protein sequences, differentiating it from previous tools that use some knowledge of the allergens' physicochemical and other properties in addition to sequence homology. We used the collected data on protein sequences of allergenic proteins as archived in the three well-established databases, SDAP 2.0, COMPARE, and AlgPred 2, to train a convolutional neural network and assessed its prediction performance by cross-validation. We then used Allergen AI to find novel potential proteins of the cupin family in date palm, spinach, maize, and red clover plants with a high allergenicity score that might have an adverse allergenic effect on sensitive individuals. By analyzing the feature importance scores (FIS) of vicilins, we identified a proline-alanine-rich (P-A) motif in the top 50% of FIS regions that overlapped with known IgE epitope regions of vicilin allergens. Furthermore, using~ 1600 allergen structures in our SDAP database, we showed the potential to incorporate 3D information in a CNN model. Future, incorporating 3D information in training data should enhance the accuracy. AllergenAI is a novel foundation for identifying the critical features that distinguish allergenic proteins.
  •    AllergenAI prediction :
       (Please note: AllergenAI is trained on allergenic protiens having less than 1000 amino acids)


    Use one word e.g Test
    For information purpose only
                          


    Help: Use protein sequence in fasta format.
    >tr|A5Z1Q9|A5Z1Q9_ARAIP Ara h 2 allergen OS=Arachis ipaensis OX=130454 GN=Ara i 2.02 PE=2 SV=1
    MAKLTILVALALFLLAAHASARQQWELQGDRRCQSQLERANLRPCEQHLMQKIQRDEDSY
    GRDPYSPSQDPYSPSQDPDRRDPYSPSPYDRRGAGSSQHQERCCNELNEFENNQRCMCEA
    LQQIMENQSDRLQGRQQEQQFKRELRNLPQQCGLRAPQRCDLEVESGGRDRY
    

    Training and validation data
    Training data
    - one-hot encode protein matrix (allergens, positive)
    - protein information index in the one-hot matrix (allergens, positive)
    - one-hot encode protein matrix (non-allergens, negative)
    - protein information index in the one-hot matrix (non-allergens, negative)

    Cupin proteins in SDAP2.0
    - one-hot encode protein matrix (cupin protiens in SDAP)
    - protein information index in the one-hot matrix (cupin protiens in SDAP)

    Non-allergen proteins in the cupin pfam
    - one-hot encode protein matrix (non-allergenic cupin)
    - protein information index in the one-hot matrix (non-allergenic cupin)

    Modeling with SDAP allergens (with protein sequence and protein 3D structure information)
    - one-hot encode protein matrix (allergens, positive)
    - protein information index in the one-hot matrix (allergens, positive)
    - one-hot encode protein matrix (non-allergens, negative)
    - protein information index in the one-hot matrix (non-allergens, negative)

    Modeling with SDAP allergens (with protein sequence, but without protein 3D structure information)
    - one-hot encode protein matrix (allergens, positive)
    - protein information index in the one-hot matrix (allergens, positive)
    - one-hot encode protein matrix (non-allergens, negative)
    - protein information index in the one-hot matrix (non-allergens, negative)

    Available Models and Codes for the AllergenAI
  • AllergenAI codes
    - AllergenAI model with full training data
    - a model with SDAP allergen protein sequence information
    - a model with SDAP allergen protein sequence and 3D structure information

  • Run Pre-processing of the input protein sequence
    - Pre-process: make one-hot encode protein matrix
    command: python AllergenAI_preprocess.py input.fata
    example command: python AllergenAI_preprocess.py Cupin.fasta
    example fasta file: Cupin.fasta

  • Run AllergenAI
    - Predict the allergenicity of your protein by running AllergenAI model
    command:python Run_AllergenAI.py input.txt
    example command: python Run_AllergenAI.py Cupin.txt
    example concatenated one-hot encode matrix of your proteins (made in the pre-processing step): Cupin.txt

  • Requirements
    Software and algorithms to train and run AllergenAI
    - python3
    - packages: tensorflow, keras 2.11, numpy and pandas
    # install python
    conda update conda
    conda create -n allergenai python=3
    conda activate allergenai
    
    # install requirements
    conda install numpy
    conda install pandas
    conda install tensorflow
    pip install -upgrade tensorflow
    conda install keras2.11
    
  • Mirror website at UT Houston:
    Visit https://compbio.uth.edu/AllergenAI/

  • SDAP Home Page | Search SDAP | SDAP Manual | SDAP FAQ | Contact  
    UTMB | Search | Directories | UTMB Map | News | Employment | Sitemap 
    This site published by Surendra Negi
    Copyright   2001-2023  The University of Texas Medical Branch. Please review our privacy policy and Internet guidelines.