------------------------------------------------------------------------------ SWISS-PROT Protein Sequence Data Bank. Release 36.0, July 1998 ------------------------------------------------------------------------------ Submission of sequence data to SWISS-PROT ------------------------------------------------------------------------------ SWISS-PROT at the EBI The EMBL Outstation - The European Bioinformatics Institute (EBI) European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom ------------------------------------------------------------------------------ Document name: SUBMIT.TXT ------------------------------------------------------------------------------ INTRODUCTION The SWISS-PROT protein data bank provides accession numbers for protein sequences when the peptide(s) have been directly sequenced. These sequences should be submitted to SWISS-PROT at the EBI. !!! Important note !!! We do not provide accession numbers, IN ADVANCE, for protein sequences that are the result of translation of nucleic acid sequences. These translations will automatically be forwarded to us. INFORMATION FOR SUBMITTERS 1. How to submit data to the SWISS-PROT Protein Sequence Database 2. Data Submission Form 3. What to submit to the SWISS-PROT Protein Sequence Database 4. How long will it take to get an accession number? 5. Data confidentiality 6. Updating your data 7. Citation updates 8. How to contact SWISS-PROT ---------------------------------------------------------------------- 1. How to submit data to the SWISS-PROT data bank Data can be sent to the EBI in one of several ways: (a) Electronic file transfer: files can be sent via computer network to DATASUBS@EBI.AC.UK (b) Floppy disks: Macintosh and IBM-compatible diskettes. Please use the 'save as text only' option (i.e. ASCII format). (c) Normal post. Note: See APPENDIX I for the different ways to contact SWISS-PROT at EBI. When we receive your data we will assign them an accession number, which serves as a reference that permanently identifies them in the database. We will inform you what accession number your data have been given and we recommend that you cite this number when referring to these data in publications. If your manuscript has already been accepted for publication, the accession number can be included at the galley proof stage as a note added in proof. We suggest that should read approximately as follows: "The sequence data reported in this paper will appear in the SWISS-PROT Databank under the accession number(s) xxxxxxx." 2. Sequence Data Submission Form The data submission form solicits all of the information needed to make a database entry; that is, the primary sequence data together with descriptive information such as the source of the sequenced segment (e.g., organism, strain, tissue) and the location of interesting regions within the sequence (e.g., coding regions, regulatory signals). It also contains information about data formats. The data submission form exists in: (a) Paper form: printed in the first issue each year of Nucleic Acids Research and available upon request from EBI. (b) Computer-readable form. [Inserted as appendix II of this document]. Please answer all questions which apply to your data. If you submit 2 or more sequences, copy and fill out this form for each additional sequence. Please include in your submission any additional sequence data which are not reported in your manuscript but which have been reliably determined. Then send (1) this form, (2) a copy of your manuscript (if available) and (3) your sequence data (in machine readable form) to this Address. 3. Format for submitting data o Please ensure that each line in your file is not longer than 80 characters; longer lines often get truncated when they are sent. o Each sequence should include the names of the authors. o Each distinct sequence should be listed separately using the same number of bases/residues per line. The length of each sequence in residues should be clearly indicated. o Enumeration must begin with "1" and continue in the direction 5' to 3' (or amino- to carboxy- terminus). o Amino acid sequences should be listed using the one-letter code. o The code for representing the sequence characters should conform to the IUPAC-IUB standards, which are described in: - Nucl. Acids Res. 13: 3021-3030(1985) (for nucleic acids). - J. Biol. Chem. 243: 3557-3559 (1968) (for amino acids). - Eur. J. Biochem 5: 151- 153 (1968) (for amino acids). 4. How long will it take to get an accession number? We will process data submissions within 7 working days of receipt and send authors notification of what accession number(s) their data have been assigned. There are several things authors can do to minimise the time it takes to get an accession number: (a) Be sure that submissions include all the necessary information and that all relevant questions on the data submission form have been answered. (b) Check the data to be sure that they do not contain inconsistencies/errors. (c) Be sure to include either a computer network address and/or a telefax number. 5. Data security Authors will be asked whether their submitted data can be made available to the public immediately or whether they should be withheld until an author-specified date. Data are never withheld after publication. 6. Updating your data This can be done by completing an update form available via: (a) Anonymous FTP site - FTP.EBI.AC.UK in the file: pub/databases/embl/release/update.doc (b) WWW Using the URL: http://www.expasy.ch/sprot/sp_update_form.html Please always remember to cite any relevant accession number(s). 7. Citation updates Most submissions represent data that have not yet been accepted for publication and therefore a full journal citation for the data is not available when the entry is created.We therefore urge researchers to let us know when and where data they have submitted to us are published, and to include relevant accession numbers in these publications. ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- APPENDIX I. HOW TO CONTACT THE SWISS-PROT DATA BANK AT THE EBI For submissions to SWISS-PROT the following can be used: (a) Computer network: datasubs@EBI.AC.UK for data submissions datalib@EBI.AC.UK for other enquiries update@EBI.AC.UK for updates and notification of publication (b) Postal address: SWISS-PROT Submissions, The EMBL Outstation - The European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom (c) Telephone: +44 1223 494499 (for data submissions) +44 1223 494444 (general) (d) Telefax: +44 1223 494472 (for data submissions) +44 1223 494468 (general) ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- APPENDIX II. SEQUENCE DATA SUBMISSION FORM SEQUENCE DATA SUBMISSION FORM This form solicits the information needed for a nucleotide or amino acid sequence database entry. By completing and returning it to us promptly you help us to enter your data in the database accurately and rapidly. These data will be shared among the following databases: DDBJ Database (DNA Data Bank of Japan; Mishima,Japan); EMBL Nucleotide Sequence Database (EBI, Cambridge, UK); GenBank (NCBI,Bethesda, USA); Swiss-Prot Protein Sequence Database (Geneva, Switzerland and EBI); International Protein Information Database in Japan (JIPID; Noda, Japan); Martinsried Institute for Protein Sequence Data (MIPS; Martinsried, FRG); and National Biomedical Research Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C., USA.). Please answer all questions which apply to your data. If you submit 2 or more non-contiguous sequences, copy and fill out this form for each additional sequence. Please include in your submission any additional sequence data which are not reported in your manuscript but which have been reliably determined (for example, introns or flanking sequences). When submitting nucleic acid sequences containing protein coding regions, also include a translation (SEPARATELY from the nucleic acid sequence). Independently sequenced peptides receive SWISS-PROT accession numbers. Then send (1) this form, (2) a copy of your manuscript (if available) and (3) your sequence data (in machine readable form) to the address shown below. Information about the various ways you can send us your data and about formats for the sequence data is given in the following two sections. Thank you. ------------------------------------------------------------------------------- We are happy to accept data submitted in either of the following ways: (1) Electronic file transfer: files can be sent via Internet to: DATASUBS@EBI.AC.UK (ask your local network expert for help or phone us). Please ensure that each line in your file is not longer than 80 characters; longer lines often get truncated when they are sent. (2) Floppy disks: we can read Macintosh and IBM- compatible diskettes. Please use the 'save as text only' feature of your editor to save your submission (i.e., in ASCII format), as otherwise we might have difficulty processing it. The EMBL Data Library can be contacted as follows: EMBL Nucleotide Sequence Submissions E-mail DATASUBS@EBI.AC.UK European Bioinformatics Institute Telefax +44 (0)1223 494472 Hinxton Hall, Hinxton Telephone +44 (0)1223 494400 Cambridge CB10 1RQ, UK. -------------------------------------------------------------------------------- When we receive your data we will assign them an accession number, which serves as a reference that permanently identifies them in the database. We will inform you what accession number your data have been given and we recommend that you cite this number when referring to these data in publications. If your manuscript has already been accepted for publication, the accession number can be included at the galley proof stage as a note added in proof. So that we can process your data and inform you of your accession number before you receive the galley proofs, please return this form to us as soon as possible. We suggest that the note added in proof should read approximately as follows: "The nucleotide sequence data reported in this paper will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number(s) xxxxxxx." FORMATS FOR SUBMITTED DATA We would appreciate receiving the sequence data formatted as follows: Each sequence should include the names of the authors. Each distinct sequence should be listed separately using the same number of bases/residues per line. The length of each sequence in bases/residues should be clearly indicated. Enumeration must begin with "1" and continue in the direction 5' to 3' (or amino- to carboxy- terminus). Amino acid sequences should be listed using the one-letter code. Translations of protein coding regions in nucleotide sequences should be submitted in a separate computer file from the nucleotide sequences themselves. The code for representing the sequence characters should conform to the IUPAC-IUB standards, which are described in: Nucl. Acids Res. 13: 3021-3030 (1985) (for nucleic acids) and J. Biol. Chem. 243: 3557-3559 (1968) and Eur. J. Biochem 5: 151- 153 (1968) (for amino acids). I. GENERAL INFORMATION ============================================================================== Your last name first name middle initials ------------------------------------------------------------------------------ Institution ------------------------------------------------------------------------------ Address ------------------------------------------------------------------------------ Computer mail address ------------------------------------------------------------------------------ Telephone Telefax number ============================================================================== On what medium and in what format are you sending us your sequence data? (see instructions at the beginning of this form) [ ] electronic mail [ ] diskette computer: operating system: editor: filename: ============================================================================== II. CITATION INFORMATION ============================================================================== These data represent [ ]new submission [ ]correction (Accession number: ) ============================================================================== These data are [ ] published [ ] in press [ ] submitted [ ] in preparation [ ] no plans to publish [ ] Thesis/Dissertation ------------------------------------------------------------------------------ authors ------------------------------------------------------------------------------ title of paper ------------------------------------------------------------------------------ journal volume, first-last pages, year ------------------------------------------------------------------------------ Do you agree that these data can be made available immediately? [ ] yes [ ] no, they can be made available after: (date) Data published before the stated date will be made available on publication ============================================================================== Does the sequence which you are sending with this form include data that do NOT appear in the above citation? [ ] no [ ] yes, from position ___________ to ___________ [ ] bases OR [ ] amino acid residues (If your sequence contains 2 or more such spans, use the feature table in section IV to indicate their positions) If so, how should these data be cited in the database? [ ] published [ ] in press [ ] submitted [ ] in preparation [ ] no plans to publish [ ] Thesis/Dissertation ------------------------------------------------------------------------------ authors ------------------------------------------------------------------------------ address (if different from that given in section I) ------------------------------------------------------------------------------ title of paper ------------------------------------------------------------------------------ journal volume, first-last pages, year ============================================================================== List references to papers and/or database entries which report sequences overlapping with that submitted here. 1st author journal, vol., pages, year and/or database, accession number ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ============================================================================== III. DESCRIPTION OF SEQUENCED SEGMENT Wherever possible, please use standard nomenclature or conventions. If a question is not applicable to your sequence, answer by writing N.A. in the appropriate space; if the information is relevant but not available, write a question mark (?). ============================================================================== What kind of molecule did you sequence? (check all boxes which apply) [ ] genomic DNA [ ] genomic RNA [ ] cDNA to mRNA [ ] cDNA to genomic RNA [ ] organelle DNA [ ] organelle RNA please specify organelle: [ ] tRNA [ ] rRNA [ ] snRNA [ ] scRNA for viruses: [ ] virus or [ ] provirus or [ ] viroid [ ] DNA or [ ] RNA [ ] ds or [ ] ss or [ ] circular [ ] enveloped or [ ] nonenveloped [ ] other nucleic acid. please specify: [ ] peptide [ ] sequence assembled by [ ] overlap of sequenced fragments [ ] homology with related sequence [ ] other. please specify: [ ] partial: [ ] N-terminal [ ] C-terminal [ ] internal fragment ============================================================================== length of sequence [ ] bases or [ ] amino acids Have you checked for vector contamination? [ ] yes [ ] no ------------------------------------------------------------------------------ gene/symbol name(s) (e.g., lacZ) ------------------------------------------------------------------------------ gene product name(s) (e.g., beta-D-galactosidase) ------------------------------------------------------------------------------ Enzyme Commission number (e.g., EC 3.2.1.23) ------------------------------------------------------------------------------ gene product subunit structure (e.g., hemoglobin alpha-2 beta-2) ============================================================================== The following items refer to the original source of the molecule you have sequenced. Please include classification information for unusual, non-standard organisms, if known: organism (species) (e.g., Mus musculus) subspecies plant cultivar ------------------------------------------------------------------------------ strain (e.g., K12, BALB/c) substrain ------------------------------------------------------------------------------ name/number of individual/isolate (e.g., patient 123) ------------------------------------------------------------------------------ laboratory host specific (natural) host ------------------------------------------------------------------------------ developmental stage [ ] germ line [ ] rearranged ------------------------------------------------------------------------------ haplotype tissue type cell type ------------------------------------------------------------------------------ allele variant [ ] macronuclear ============================================================================== The following items refer to the immediate experimental source of the submitted sequence. name of cell line (e.g., Hela; 3T3-L1) or plant cultivar ------------------------------------------------------------------------------ clone library clone(s), subclone(s) ============================================================================== The following items refer to the position of the submitted sequence in the genome. chromosome (or segment) name/number ------------------------------------------------------------------------------ map position units: [ ] genome % [ ] nucleotide number [ ] other: ============================================================================== Using single words or short phrases, describe the properties of the sequence in terms of: - its associated phenotype(s); - the biological/enzymatic activity of its product; - the general functional classification of the gene and/or gene product - macromolecules to which the gene product can bind (e.g., DNA, calcium, other proteins); - subcellular localization of the gene product; - any other relevant information. - homology (>100bp/30aa) - tissues in which protein/mRNA is expressed ============================================================================== IV. FEATURES OF THE SEQUENCE Please list below the types and locations of all significant features experimentally identified within the sequence. Be sure that your sequence is numbered beginning with "1." Use < or > if a feature extends beyond the beginning or end of the indicated sequence span. In the column marked fill in feature type of feature (see information below) from number of first base/amino acid in the feature to number of last base/amino acid in the feature bp an "x" if numbering refers to position of a base pair in a nucleotide sequence aa an "x" if numbering refers to position of an amino acid residue in a peptide sequence id indicate method by which the feature was identified. E = experimentally; S = by similarity with known sequence or to an established consensus sequence; P = by similarity to some other pattern, such as an open reading frame comp an "x" for a nucleotide sequence feature located on strand complementary to that reported here Significant features include: - regulatory signals (e.g., promoters, attenuators, enhancers) - transcribed regions (e.g., mRNA, rRNA, tRNA). (indicate reading frame if start and stop codons are not present) - regions subject to post-transcriptional modificaton (e.g., introns, modified bases) - translated regions (include stop-codon in coding sequence) - extent of signal peptide, prepropeptide, propeptide, mature peptide - other domains/sites of interest (e.g., extracellular domain, DNA- binding domain, active site, inhibitory site) - conflicts with sequence data reported by other authors - variations and polymorphisms The first 3 lines of the table are filled in with examples. Note: Give nucleotide coordinates for protein features on nucleotide sequence (e.g. signal peptide, mature peptid etc)" ============================================================================== Numbering for features on submitted sequence [ ] matches manuscript [ ] does not match manuscript ============================================================================== feature from to bp aa id comp ------------------------------------------------------------------------------ EXAMPLE TATA box 276 282 x S ------------------------------------------------------------------------------ EXAMPLE exon 1 301 445 x ------------------------------------------------------------------------------ EXAMPLE sig_peptide 333 375 x ============================================================================== ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ============================================================================== For E-mail submissions, include your sequence in electronic form here: SEQUENCE: END OF SEQUENCE