Lab Notes

This is the on-line tutorial version of our previously presented workshop. There are some aspects of the on-site communication between lab instructors and participants that haven't made it into the current documentation for on-line use. Here is a summary of those comments.

general

lab session 1
  1. Analysis of the mouse genome is not as advanced as that of the human genome and therefore mouse sequences may not have genomic contig information.
  2. If an NM_ sequence is not available, use an XM_. Check this link for clarification of these terms.
  3. As annotation of genomic information is completed, it is moved from the htgs database to nr. The more mature a genome study, the less likely its sequence data will be contained in the htgs database.
  4. The development of super genomic contigs has created a situation where found contigs are too big to work in blast2sequences. Cutting the super contig into parts also doesn't appear to work. Check any found genomic contigs for their size prior to using in the blast2sequences program. If the contig is greater than 4,500,000 bases long, most likely it won't work.
  5. Use the swissprotein hits in the blastx results as a means of identifying members of a protein family. See blastx hints for the reason why.
  6. DNA-to-protein translations can be tricky, especially when sequences are comprised of intron and coding regions. Be sure to review initial blastn results to gain insight into this issue for each individual case.

lab session two
  1. Not all proteins contain motifs or signatures. Not all means of identifying such patterns agree with one another.
  2. Multiple alignment programs will align any sequences given them. By having additional motif or signature information on the sequences, the quality of the resulting alignment can be better evaluated. Protein family members should share such patterns in their alignments. If they don't, there is a problem.

lab session three
  1. Secondary structure prediction techniques usually don't agree with one another. The Chou/Fasman site used double predicts residues. It is quite common to find a residue listed as both a helix and a sheet. Go for trends. Discount the double predicted Chou/Fasman line, if the other two techniques agree at that point. At times it is more important that a structural element be predicted in a location than its actual type.
  2. Phobic stretches generally need to be at least 20 residues long to be considered a possible transmembrane region.
  3. The blastp results may find a PDB structure that wasn't given in the list of structures associated with a Pfam signature. All databases have lag time between when data becomes available and when it is incorporated into their framework. Go with the blastp hit.
  4. Even when a blastp hit says there is a match to a given PDB data file, those residues may be missing from the actual structure.
  5. Not all PDB structures have complete coordinate data given for each residue they contain. Some are only comprised of backbone information. These will not display cartoon images in RasMol.


Copyright 2003 Regents of the University of California. All rights reserved.