stackPACK v2.2
 Introduction |  WebProjectManager |  WebPipe |  WebProbe |  WebReport |  Support |  About 

Input data format descriptions

Files used as input for stackPACK may be in the following formats:

  • GenBank flatfile format
    • GenBank flatfile format is defined as the format of sequence entries in the GenBank database or as downloaded from the NCBI web site (e.g., Entrez search results) when GenBank format is specified. The full GenBank format specification is found in section 3.4 of the GenBank release notes.
    • GenBank format should be used in order to parse maximum annotation information upon import.
    • NCBI entries that have been deleted and replaced by a new entry with a different accession number may have the old accession number appended to the new accession number within NCBI GenBank ACCESSION field, e.g. U15570 L36804. In these cases stackPACK will strip everything after the space.

  • Simple FASTA format
    • >[accession].[direction] [clone ID] ( Where direction is either "r1" for a 3-Prime clone or "f1" for a 5-Prime clone )
    • e.g.
          >37463.f1 g83244
          ACGTGACTGCTACGTACGGGCGTTACGACTGCTACGATCGCATGC
          TATGTCGTAGCAGCCGTGTACACGTGTTTATTCGTAGGGCTTCTA
          CTCAGTCGTACGTACGTACGT
        
    Note: A sequence with the accession "R.C.3746" cannot be specified as simple FASTA format, as the program will parse "R" for the accession and "C" for the direction. In these cases the GUESS option should be used.


  • STACK FASTA format
    • >[accession] [gi] | [accession] CLONE: [clone] CLONE_LIB: [clonelib] LEN: [len] FILE [source file] [direction<5-PRIME|3-PRIME>] DEFN: [descriptive text]
    • e.g.
          >T27877 g609975 | T27877 CLONE: 17194 CLONE_LIB: Human Eye LEN: 505 bp FILE gbest3.seq 5-PRIME DEFN: EST19137 Homo sapiens cDNA 5'end
          ACGTGACTGCTACGTACGGGCGTTACGACTGCTACGATCGCATGC
          TATGTCGTAGCAGCCGTGTACACGTGTTTATTCGTAGGGCTTCTA
          CTCAGTCGTACGTACGTACGT
        
  • NCBI FASTA format
          >gi|4468770|emb|AJ009167.1|TSAJ9167 Trypanosoma sp. 18S rRNA gene, isolate K&A
          ACGTGACTGCTACGTACGGGCGTTACGACTGCTACGATCGCATGC
          TATGTCGTAGCAGCCGTGTACACGTGTTTATTCGTAGGGCTTCTA
          CTCAGTCGTACGTACGTACGT
        
    Note: Clone information in NCBI format is part of the free form text in the definition line. Clone information is thus not parsed when using NCBI format and therefore data imported as NCBI format cannot be clone linked.


  • Mixed or unknown FASTA formats

    Files with mixed FASTA header line formats or files with FASTA header lines not described above can also be imported.
    • If stackPACK does not identify one of its pre-defined FASTA headers, the program determines an accession number for the sequence entry by extracting all valid characters (alphanumeric and punctuation such as "_" or ".") found between the > and the first space.
    • Sequences will ONLY be imported if there are 255 or less valid characters between the > and the first space. It should thus be ensured that these 255 characters of each sequence entered for processing are unique.
    • If possible, other details in the header may be parsed in as well. Otherwise, the remainder of the line is ignored.
    • Minimum requirement for FASTA format input file to stackPACK is:
      >[accession number]

Copyright 1999-2002 Electric Genetics