|Department of Biochemistry and Molecular Biology||Sealy Center for Structural Biology||Computational Biology|
SDAP - Structural Database of Allergenic Proteins
AllerML - Markup Language for Allergens
The Allergen Markup Language, AllerML, that we describe here is a first step in developing automated tools to access data on allergens in multiple databases. AllerML is based on IUIS nomenclature and consists of a hierarchical set of tags that describe the most important information normally contained in allergen databases, including common names, sources, sequence, structure, IgE and T-cell epitopes, and cross-reactivity. AllerML also includes tags for attributes, such as Pfam classifications, that link allergen-specific databases to other general purpose biological data sets. In its current form, AllerML can be used to automate the dynamic exchange of information on allergens, to incorporate data on new allergenic proteins as they are identified, and to support computational and bioinformatics studies of allergenicity and clinically significant cross-reactivity. Wide implementation of AllerML will simplify automatic exchange of data between allergen databases, and improve data access for integrated computational and bioinformatics analysis.
The AllerML tags proposed here encode all molecular information on allergens and IgE epitopes, as present in the major allergen databases. For each allergen in SDAP, the AllerML record can be obtained from the link “Translate to AllerML” located immediately below the title line with the allergen name in the SDAP page corresponding to an allergen.
Table 1. AllerML Tags (only the start tag is shown for each section)
Allergen Name and Taxonomy
An example of the implementation of AllerML for Ara h 3 is shown in Scheme 1. The first part of the record indicates the IUIS allergen name and type (IUIS or non-IUIS). Other unique identifiers from each allergen database can be included (if available) using appropriate tags. The SDAP ID is given here as an example. The second part of the record contains source organism information including the accession number from the NCBI taxonomy database (http://www.ncbi.nlm.nih.gov/Taxonomy/). Comments may be included in any AllerML section under the tag <AllerML_Comment>, and may contain HTML tags used to format the text for display or to include other HTML elements such as tables or hyperlinks.
Scheme 1. Core information of Ara h 3 as an AllerML document
Cross-references to Other Web Databases
Other protein databases also contain relevant information about allergenic proteins, and we propose the following set of AllerML tags to link to the most relevant such services:
Scheme 2. Cross-references section of Ara h 3
Scheme 3. Protein section of Ara h 3
Scheme 4. Epitope section of Ara h 3
Allergen INSCH Motifs
Scheme 5. AllerML encoding for the INSCH motif section of Ara h 3.
Allergen MotifMate Motifs
Scheme 6. AllerML encoding for the MotifMate motifs section of Ara h 3.
IgE Cross-reactive Peptides
Scheme 7. AllerML encoding of quantitative data regarding allergen cross-reactivity with peptides for Jun a 1