|
Part A
Protein structural information comes in a variety of forms. The classical way of thinking about structures is to consider only 3D coordinate data. However, because of the limited number of available solved structures as compared to the number of possible proteins, other approaches have been developed to try to determine secondary and even tertiary protein structures.
When Linus Pauling made his prediction in 1948 that proteins would be composed of alpha helix and beta sheet units, no protein structures had yet been determined. His prediction was based solely on the idea that the potential hydrogen bonding possible in such structures would increase their stability and make them more probable. When improvements in x-ray diffraction techniques made it possible to solve protein three-dimensional structures, the predicted structures were there.
The possibility of a given section of a peptide folding to form a helix, a sheet or a turn is primarily dependent on the preferred conformations of the constituent residues and the packing quality of the surface formed. Secondary structure prediction schemes have been devised, with relative success, using only local or semi-local sequence patterns due to this local characteristic of folding forces.
Various secondary structure prediction tools have been developed. These tools use different algorithms and are based on different computational approaches. For more information on some of these tools, check out the following links. The first is printed out in the hard copy version and is located in the Lab III:Hints section.
secondary structure prediction background information
Rob Russell's "Secondary Structure Prediction and links" page
Adrian Shepherd's "Protein Secondary Structure Prediction with Neural Networks: A Tutorial" (UCL)
There are a number of secondary structure prediction sites on the web. The ones listed below represent implementations of a Chou-Fasman prediction, a Garnier, Osguthorpe and Robson (GOR) prediction and a neural net approach to the problem.
Chou-Fasman GOR NNPREDICT
[Chou-Fasman run - input format: fasta file] [GOR run - input format: raw sequence] [NNPREDICT run - input format: raw sequence] Get a secondary structure prediction for your protein sequence from each of these sites. Print off your results.
Use your second copy of the GenBank version of your protein sequence and these predictions to see how well they correspond with one another. Use the provided highlighters to mark your protein sequence with the prediction from each technique.
1. How well do the three predictions agree with one another?
2. Are there any areas of your protein where the predictions agree? If so, where are these locations?
3. Do these areas of agreement correspond with any of your earlier found motifs?Concern over secondary structure prediction reliability and the growth in the number of solved crystal structures spawn the development of other techniques to predict the structure. One of the techniques that attempts to predict tertiary structure from primary sequence is known as threading.
threading background information
Next, check to see if the protein has long stretches of hydrophobic residues indicating phobic regions. This could indicate the central phobic core of the folded protein or possible transmembrane segments.
Go to the Weizmann Institute's Hydropathic Profile site and determine your protein's profile.
[Weizmann's Hydropathic Profile run - input format: raw sequence] In the resulting plot, check to see if there are any large sections of the plot below the zero line that are at least 20 or more residues long. Record the following:
4. Number of possible phobic stretches
5. Location of possible phobic stretches
It is extremely difficult to form crystals, the prerequisite for x-ray analysis, with transmembrane proteins. Therefore, there are very few crystal structures of transmembrane proteins available. The predicting of the transmembrane segments for such proteins can be considered a specialized version of secondary structure prediction. Such predictions can provide organizational information useful for understanding how these proteins function and some 2D structure relevant to the membrane.
Take your protein sequence and use it in the following transmembrane prediction sites to determine if it is a transmembrane protein.
HMMTOP SOSUI TMHMM
[HMMTOP run - input format: raw sequence] [SOSUI run - input format: raw sequence] [TMHMM run - input format: fasta file] 6. Does your protein contain any transmembrane segments?
7. How well do the prediction methods agree with one another?
8. Is there a consensus on areas of your protein that might be transmembrane segments?If your sequence does not have any transmembrane domains, choose one of the following protein sequences to search against the transmembrane prediction sites. The protein sequences correspond to the following nucleotide sequences given in exercise one: familial hypercholesterolemia, cystic fibrosis, and nocturnal asthma.
cystic fibrosis cystic fibrosis familial hypercholesterolemia nocturnal asthma sequence 1 sequence 2 sequence sequence 1 There are more aspects to transmembrane analysis than just running a few predictions. For those interested in exploring this area in more depth there is an additional optional exercise on the topic.
To determine if your protein is similar to any with solved crystal structures, use your protein to do a BLASTP search against the pdb database. Be sure to have filtering turned off.
[BLASTP run - input format: fasta file] While waiting for your results, move your mouse over the conserved domains results, if there are any. Scroll down the page to check on the displayed alignments for the domain matches from Pfam.
9. How do the resulting conserved domains compare with your earlier motifs results?
Format! your results and explore them to answer the following question.
10. Do you have any quality hits in signature regions?
Pick the best hit from each of your signature regions and record its PDB code, chain designation and the location of the matching section.
If your sequence does not have a good BLASTP PDB hit, use the following information to do the upcoming Pfam, PDB and RasMol sections.
PDB code Pfam signature blastp match area details 2HMB lipocalin 3-116 info
Using the Pfam signature names that you have from before, go to the Sanger Centre's Browse Pfam site. Enter your most significant Pfam signature name from your BLASTP search into the Enter query word(s) box. A list of hits results. It may be that all the family names you are looking for are on that list.
Click on one of your names of interest and go to that family's Pfam page. Scroll down that page until you come to the Database References section. Click on the box with the up/down arrows to get the full listing of the structure files containing that signature.
Is your recorded PDB code on the list? If so, select that code and record its chain designation and area containing the signature.
Repeat this process until you have looked at the information for all your signatures. It may be that your PDB codes are not on the Pfam list. If so, just move on to the next one. Hopefully, at least one of your BLASTP significant signature PDB codes will match a member of the Pfam list.
11. Did you have success matching PDB codes to the Pfam signature members list?
Using the identification code for your structure of interest, go the the PDB site. On this page is a box to Enter a PDB ID or keyword. Enter your code and press the Find a structure button.
The resulting page gives reference information on the PDB file of interest. On the left side of this page is a Download/Display File link that you will use to get your own copy of the data file.
The top part of the resultant page deals with displaying the structure. In the bottom half of the page, select the X link for downloading an uncompressed PDB file from the file format table given. Save the file to your local machine.
Start up RasMol.
To get the second window necessary for entering commands click on the RasMol Command Line section of the bottom gray panel on the screen. Move the resulting two windows so that you can clearly see each of them on your terminal screen. There will be a little overlap between the two windows. Increase the size of the window with the black background.
From the File menu, select Open, then navigate to the pdb file location. Highlight the name of the desired structure file and click on the Open button. Your structure will be displayed in the window with the black background.
From the Display menu, select Cartoons and the structure will now be displayed according to the secondary structure assignments listed in the file. Then, from the Colours menu select Structure to have the image colored in the program's default coloring scheme. Helixes will be colored pink, sheets yellow, turns blue and random white.
Use the scroll bars to rotate the structure.
Refer to the information you collected on this PDB code from the Pfam site. Click on the command window to make it active. At the RasMol> prompt enter the necessary information to select the region of this structure that corresponds to the signature of interest.
What you do depends on the signature being used.
If your signature has a chain designation, enter the following command. Otherwise skip on to the next instruction.
select (*chain_designation)
select (resno>=start_number) and (resno<=end_number)
colour greenWhen you are finished looking at your structure and have no other structures to look at, select the Exit from the File menu to exit the program. Otherwise, select Close from the File menu to delete the image in the window and go through the process of loading in another structure. When you are finished, use the comands given before to exit the program.
12. How much of the PDB data file did the signature section comprise?
Rasmol is simple, but effective structure-viewing software. Providing that you have the necessary PDB files and invest the time into learning about how to use the program, it is possible to create very colorful demos to illustrate structural concepts.
13. From the data you collected over the last three labs, what can you say about the initial DNA sequence you selected (source [human/mouse], genomic or mRNA fragment, name of the gene, number of exons in the gene)?
14. Summarize the characterization information you have collected on the gene's associated protein. Mention family signatures or secondary structural elements contained in the protein. Is the protein a transmembrane protein? If so, sketch its segments and orientation in the membrane.
![]()
Part B
Collect information on the protein given below.
>L-mandelate dehydrogenase MSQNLFNVEDYRKLRQKRLPKMVYDYLEGGAEDEYGVKHNRDVFQQWRFKPKRLVDVSRRSLQAEVLGKR QSMPLLIGPTGLNGALWPKGDLALARAATKAGIPFVLSTASNMSIEDLARQCDGDLWFQLYVIHREIAQG MVLKALHTGYTTLVLTTDVAVNGYRERDLHNRFKIPMSYSAKVVLDGCLHPRWSLDFVRHGMPQLANFVS SQTSSLEMQAALMSRQMDASFNWEALRWLRDLWPHKLLVKGLLSAEDADRCIAEGADGVILSNHGGRQLD CAISPMEVLAQSVAKTGKPVLIDSGFRRGSDIVKALALGAEAVLLGRATLYGLAARGETGVDEVLTLLKA DIDRTLAQIGCPDITSLSPDYLQNEGVTNTAPVDHLIGKGTHARun a BLASTP search against the most appropriate database with this sequence and identify a divergent set of at least 4 proteins you think are homologous to this sequence. Write down their swissprot access codes. Save these proteins as fasta files on your local machine.
Generate a divergent alignment of at least 4 of these potential homologs using clustalw. Print off your alignment.
Run a Pfam protein search against the mandelate dehydrogenase (MDH) sequence using the raw sequence in the search.
Identify any significant pfam models and inspect them to determine how well they match to your clustalw alignment. Scroll down the results page and retrieve the seed alignment for the model and examine it.
Using the pfam seed alignment, generate a new multiple alignment that includes MDH and your divergent set of homologs. You can do this by going back and repeating your original search of Pfam using the MDH sequence. At the bottom of that output page, click on the button marked "align query/15-379 to FMN_dh(ls) Seed". To change the format of the alignment, select options from the "Format for alignment of query to Seed:" menu near the top of the page.
Mandelate dehydrogenase (MDH) is a membrane-associated protein. The investigators who first characterized it wanted to get a 3D structure but knew this would be difficult because of the membrane association. Therefore, they identified homologs of this sequence that represented soluble proteins, aligned them with mandelate dehydrogenase, and determined a region in mandelate dehydrogenase that might be responsible for membrane association. Because this region does not insert into the membrane but rather is associated with it through a different mechanism (polytopic membrane association), this region cannot be predicted using hydrophobicity plots. Therefore, the only way to determine the relevant region is to align mandelate dehydrogenase with soluble homologs and identify a region that could be substituted between MDH and that homolog to confer a more soluble nature to the MDH.
Determine criteria/characteristics of the alignment that would be most useful for finding such an area. Once you have your candidate regions, map these onto the 3-dimensional structure of one of these homologs using RasMol to determine whether your choice is a reasonable one.
If you are interested in comparing your choice with that discovered by these investigators, the reference is:
Sukumar et al., (2001) Biochem. 40: 9870-9878
For those who would like to know more about structural analysis, there is an optional exercise on the topic.