SCF file name conventions and data backup info

The SCF data storage is based on the systematic naming of the sequence data files. There are different types of files: the trace files which contain the raw data from the sequence, the text files which contain the DNA sequence text file according to the automatic base calling from ABI and the FASTA- and GCG-file from that. Additionally there are files according to basecalling from PHRED and trimmed from PHRED20 Scores and the corresponding FASTA-files and GCG-files.

The basic nomenclature for sequence data file names at SCF is:

  • normal requests:
    LaneNo-requestNo-username-template-primer
    Example: 64-011223-weisshaa-pUC9-rev
  • bigorder requests:
    LaneNo-KEY1requestNo-libraryNo-plateNo-platePositionKEY2-primer
    Example1: 02-E012345-006-199-B07-T7
    Example2: 02-E012345-006-199-A01Q-T7
  • Explanation of file name segments:
    • LaneNo: number of the lane on the capillary array of the sequencer
    • requestNo: 6-digit ID-number of a set of sequence reactions which belong to one order of up to 96 reactions
      • version flag: an identifier for repeated analysis of the same template & primer combination from the same request (usually "w").
    • username: UNIX username of the user who placed the sequencing order (sequencing request)
    • template: 4-digit designation of the template DNA in a sequence reaction
    • KEY1: identifier of the request (order) type, see table below
    • libraryNo: 3-digit number which identifies a unique Library (collection of clones or templates)
    • plateNo: 3-digit number which identifies a unique plate (96 or 384) in a library
    • platePosition: well position in a 96- or 384-well microtiter plate which identifies a clone or a template
    • KEY2: identifier of a specific workflow type within bigorder-projects, see table below
    • primer: oligonucleotide used to sequence the template
  • File name extensions:
    Trace files have no extension, the sequence text files (in GCG format) have the extension ".Seq". Text files in FASTA-format have the extension ".fasta". The data distribution scripts (see below) use the file extensions to identify the file type.
    Example:
    02-E012345-006-199-B07-T7 (saved to backup)
    64-011223-weisshaa-pUC9-rev (saved to backup)
    64-011223-weisshaa-pUC9-rev.Seq (copied backup)
    64-011223-weisshaa-pUC9-rev.fasta (copied backup)
  • version flag:
    If a reaction is repeated, the new file corresponding to the same reaction in the same request get a "w-flag" (from German "wiederholt"). This flag is inserted after the request number. The second repetition gets a "x", and so on until "z".
    Note that in addition to the "w-flag" the file also (usually) receives a different lane number.
  • KEY1 values:
    • B: analyse file by BLAST (script-based blasting, discontinued)
    • E: EST sequence
    • L: Library sequence reaction
    • S: Sugarbeet EST sequence (files are delivered to BeetBase)
    • K: GABI-Kat sequence (files are delivered to Yong Li's GABI-Kat LIMS system)
    • P: sequencing Project (analysed by primer-walking)
    • C: automatic base calling Checked by human being
    • G: sequence files are delivered to a defined person (Groupleader) in addition to user
    • KEY2 values:
      • q: quality control reaction from bottom plate area (to check plate orientation in E/L/S requests)
      • Q: quality control reaction from top plate area (to check plate orientation in E/L/S requests)
      • B: reaction selected from a plate according to the PHRED20-values - all wells with "bad" results selected for new analysis
      • S: free selection from a given library plate
      • C: confirmation of GABI-Kat line result in the following generation of the plant line