|
Transmembrane proteins are very interesting and yet difficult to study. This is due to the complex nature of the proteins themselves and the fact that it is extremely difficult to determine their 3D structures by conventional methods.
Transmembrane proteins can contain from 1 to over 20 transmembrane segments. A transmembrane segment can either completely span a membrane or can be only partially inserted into it. Both helical and sheet secondary structural elements can be used to span a membrane. A single protein may perform the entire transporter function or be one of a number of proteins which do this task.
Classical TMD detection techniques
Calculating hydrophilicity/hydrophobicity
The classic means of finding transmembrane segments is to determine a
hydropathic profile for the protein in question. A description of the
process is given below (text taken from the Weizmann help pages).
The hydropathic profile of a protein is calculated by assigning each amino acid a numerical value ("hydropathy index") and then repetitively averaging these values along the peptide chain. The values assigned to each amino acid can either be the Hopp-Woods values or the Kyte-Doolittle values. The window length over which the hydropathy indices are averaged must also be set. This changes the hydropathy of the protein and will affect which and how many proteins are returned in the protein search. Further information about the calculation of hydrophilicity can be found in
The profile determination uses software based on a windowing technique. The size of the window used impacts the output. Using too small a window produces a graph with lots of noise, too big smoothes out the graph and may remove any signal. The window size used should be about the size of the feature being sought (in this case a transmembrane helix the assumed size is about 20 residues). A size 27 window will smooth out the signal some, but should leave the strong phobic peaks intact.
Protein Functional/Family Database Searching
Protein characterization can also provide information that can be used
to identify transmembrane segments. There are a number of motifs that are
transmembrane specific. Other motifs can be used to identify possible
extracellular regions. Many high probability patterns are short and can give
false positive hits, but they also can help identify possible extracellular regions
of a transmembrane protein. ASN_GLYCOSYLATION is one of these patterns.
N-glycosylation sites, if real, are usually on the extracellular side of a membrane.
This information could be used to help determine the orientation of a protein in
the membrane.
TMD prediction
Transmembrane prediction programs are all based on the premise that a
primarily phobic region of sequential amino acids in such a protein
forms a helix that spans the membrane. The very first means of
predicting transmembrane segments was simply to look at the protein's
hydrophobic profiles. However, this technique can't tell the difference
between transmembrane segments and phobic core regions in a protein.
The main difference between the various prediction programs is in the way in which each defines the characteristics of a transmembrane segment. Not all transmembrane segments show the same characteristics. Some are composed entirely of phobic amino acids, others have some charged residues. The ends of the segments that are in the membrane faces need to be compatible with the philic environment found there. This observation is sometimes used to help separate transmembrane segments from phobic protein core regions.
Because transmembrane proteins can and do interact with ions and/or small charged molecules, there needs to be charged residues within the structure of the channel they form. None of the existing tools are very good at predicting this type of segment.
There currently is no prediction technique that works for those transmembrane proteins which possess sheets spanning a membrane.
Protein Database Searching for Similar Annotated Proteins
Some organisms have been more carefully studied than others. Even when
considering only the human, mouse and rat genomes, a protein from one of these
three species may be more completely studied and therefore annonated more
completely than the others. Searching the databases or literature for protein
family members in other species can often lead to additional information that
would allow the infering of possible TMD segments based on conservation.
The combining of information from various sources is often required in order to be able to properly assign the transmembrane segments. This information can come from literature, physical experiments, motif determinations or database sources.