|
5'RACE sequence tags are sequenced by the Genomic Core Facility at the University of California, San Francisco, using an ABI PRISM 3700 DNA Analyzer. The resulting chromatogram files, which must have filenames that conform to specific file naming conventions, are downloaded and processed according to the following automated protocol:
Chromatogram files are read by PHRED, and the resulting base calls are trimmed using a base quality cutoff of 14.6. (This value has been empirically determined to provide better results for our experimental protocol than PHRED's default quality cutoff of 30.0). Reverse complements are then generated for the resulting sequences in order to represent the coding strand of each sequence tag.
Poly-T and vector sequence removal
Poly-T tails at the 5' end of the sequence tag are present as a result of our 5'RACE procedure. Any poly-T tail greater than or equal to eight T's in length is automatically removed.
Gene trap vector sequence is removed from sequence tags by following our vector removal protocol prior to the entering of sequence data into our blastable database. During the vector removal procedure, sequences into which the vector has spliced differently than expected are flagged and a comment is generated for that sequence. In some cases of alternate splicing events, two versions of the sequence tag are posted. After poly-T and vector removal, sequences >20 bases in length are deposited into our blastable database and sequences >40 bases in length are run through our automated identification protocol.