|
The manual identification protocol is used to identify sequence tags that do not satisfy the criteria specified in the automated annotation protocol. It allows for the identification of sequence tags that otherwise would be left unidentified by automated methods. The sequence tags that require manual identification contain one or more regions we describe as "questionable sequence".
Questionable sequence is defined as sequence derived from a poor quality section of an ABI trace file. This section may have long stretches of homopolymeric regions, poor resolution, overlapping peaks, or peaks obscured by background noise. Homopolymeric regions occur because the polymerase has difficulty processing a repeated stretch of one base type, sometimes referred to as slippage or "stuttering". The most common case is slippage occurring during a poly-A stretch, causing sequencing irregularities downstream of the poly-A region. Another cause of poor quality or questionable sequence is poor resolution in the gel. Poor resolution is more common near the bottom of the gel, where it becomes more difficult to call bases accurately. Also, overlapping peaks may indicate multiple products. This type of noise is especially problematic when signal strengths are low.
If questionable sequence occurs
- over the entire length of the sequence tag, the sequence tag is placed into a separate file and e-mail is sent to Component 1 informing them that it will be removed from the website.
- only at the 5 end of the sequence tag, that part of the sequence is manually edited out of the gene tag using the ABI trace file as a guide. The sequence tag is then re-submitted for automated annotation as described in the automated annotation protocol. If the sequence is less than 40 bp after editing, it is removed from the website; however, the sequence is still available through Advanced Download. If the sequence is less than 20 bp after editing, it is removed from the website and archived in a separate table. Sequences less than 20 bp in length are not available via Advanced Download.
- only at the 3 end of the sequence tag, the problematic sequence is indicated by strike-through. That part of the sequence is not used by the automated annotation protocol when the genetag is re-submitted for identification.
- in the middle of the sequence tag and both segments on either side of the questionable sequence align to the same gene with statistically significant scores, the sequence tag is manually annotated. BLAST is used to determine the best candidate sequences, and the accession numbers of the candidate sequences is entered manually. The candidate sequences are then grouped into synonym lists by an automated process already described in the automated annotation protocol. Both the segment lengths of each matching segment (entered manually) and the percent identity with the gene (calculated automatically) are displayed on the annotation page.
For all manually edited sequences, we urge you to examine the ABI trace file and to BLAST the sequence tag in order to understand the basis for the identification. In addition, the manually identified sequence tags are regularly subjected to the automated process and may be updated.
Because these cell lines contain anomalies in their ABI trace files, resequencing of manually identified cell lines is strongly recommended before use in biological experiments.