Lab Notes
This is the on-line tutorial version of our previously presented workshop. There are some aspects
of the on-site communication between lab instructors and participants
that haven't made it into the current documentation for on-line use. Here is a
summary of those comments.
general
The web is a constantly changing environment. The sites given here may have
changed their appearance or even their operation in the time between the
finalization of the materials and its offering. Do your best.
The exercises are designed to mimic real research situations.
Therefore, some of the sequences will not give results with all of the
analysis techniques. Some of the answers will change with time as
more data is deposited into the databases.
lab session 1
- Analysis of the mouse genome is not as advanced as that of the human
genome and therefore mouse sequences may not have genomic contig information.
- If an NM_ sequence is not available, use an XM_. Check this
link for clarification of these terms.
- As annotation of genomic information is completed, it is moved from the
htgs
database to nr.
The more mature a genome study, the less likely its sequence data will be contained in the htgs database.
- The development of super genomic contigs has created a situation where
found contigs are too big to work in blast2sequences. Cutting the
super contig into parts also doesn't appear to work. Check any found genomic
contigs for their size prior to using in the blast2sequences program. If the
contig is greater than 4,500,000 bases long, most likely it won't work.
- Use the swissprotein hits in the blastx results as a means of
identifying members of a protein family. See blastx hints
for the reason why.
- DNA-to-protein translations can be tricky, especially when sequences
are comprised of intron and coding regions. Be sure to review initial
blastn results to gain insight into this issue for each individual case.
lab session two
- Not all proteins contain motifs or signatures. Not all means of
identifying such patterns agree with one another.
- Multiple alignment programs will align any sequences given
them. By having additional motif or signature information on the sequences,
the quality of the resulting alignment can be better evaluated. Protein family
members should share such patterns in their alignments. If they don't, there is a
problem.
lab session three
- Secondary structure prediction techniques usually don't agree with
one another. The Chou/Fasman site used double predicts residues. It is
quite common to find a residue listed as both a helix and a sheet. Go for
trends. Discount the double predicted Chou/Fasman line, if the other two
techniques agree at that point. At times it is more important that a
structural element be predicted in a location than its actual type.
- Phobic stretches generally need to be at least 20 residues long to be considered
a possible transmembrane region.
- The blastp results may find a PDB structure that wasn't given in the
list of structures associated with a Pfam signature. All databases have
lag time between when data becomes available and when it is incorporated
into their framework. Go with the blastp hit.
- Even when a blastp hit says there is a match to a given PDB data file,
those residues may be missing from the actual structure.
- Not all PDB structures have complete coordinate data given for each
residue they contain. Some are only comprised of backbone information.
These will not display cartoon images in RasMol.
Copyright 2003 Regents of the University of California.
All rights reserved.