|
All splicing event identification and vector removal occurs after data processing of sequence tags has been completed.
Splicing Events
- Properly spliced sequence
In a correct splicing event, one of our gene trap vectors inserts into the intronic sequence of a gene. After transcription and intron splicing, the exon of the trapped gene abuts the splice acceptor sequence/Engrailed-2 (En2) exon, which is part of the vector. No vector intron sequence should be present. Therefore, automated identification of proper splicing events is performed by screening gene tags for the presence of splice acceptor/En2 exon sequence and the absence of vector intron sequence, e.g.:
- Found:
- GTCCCAGGTCCCGAAAA (En2 exon)
- But not:
- ctccatgacaaccagGTCCCAGGTCCCGAA
(lowercase = vector intron sequence upstream of splice acceptor)§![]()
If vector intron sequence is found in addition to En2 exon sequence, the gene tag is not posted.
§ Note: Since the sequence quality beyond the splice junction is usually not very good with these events, sequences that are similar, but not necessarily identical to the vector intron sequence are flagged.
- SD1 exon trap (cryptic splice donor 1 in vector intron is activated)
There are two cryptic donor sites in the vector intron sequence that are activated upon insertions into exons of genes. The first (SD1) is about 30 bp from the 5' end of the vector. The second (SD2) is located about 420 bp from the 5' end of the vector.
- Found:
- gaagaggaaccgaaaGTCCCAGGTCCCGAAAA
![]()
- SD2 exon trap (cryptic splice donor 2 in vector intron is activated)
- Found:
- tgggtttgccctttgGTCCCAGGTCCCGAAAA
![]()
- No splice acceptor present (exon trap)
Sometimes, the splice acceptor is lost as a consequence of the insertion. This occurs more often when the gene trap vector inserts in the exons of genes rather than in the introns of genes.
- Could not find:
- GTCCCAGGTCCCGAAAA
(splice acceptor sequence)- Unspliced sequence
A sequence is characterized as unspliced if it contains vector intron sequence and no splice acceptor/En2 exon sequence. These sequences are not kept in the database. An example:
- Found:
- caaccagtaacctctgccctttctcctccatgacaaccag
(vector intron sequence upstream of splice acceptor)
For all cases above, vector sequence is removed prior to posting the gene tag on the browse page. For case #3 (no splice acceptor), gene tags are posted in two versions:
- sequence prior to poly-T trimming and vector removal
- sequence after poly-T trimming and vector removal
Following vector removal, our automated annotation procedure is run.