The C2-domain of the classical protein kinase C isoforms, which is also
present in synaptotagmin,
has been shown to be important for Ca2+-dependent membrane attachment of
these proteins. Subsequently, this domain of about 120 residues was also found in
rabphilin, unc-13, and cytosolic phospholipase A2. By application of an improved profile-
based search method we were able to detect previously unknown C2-like domains in a
great variety of otherwise unrelated proteins.
Among these proteins are: several non-classical PKC isoforms, all isoforms of PI-specific phospholipase C, the plant
phospholipase D, phosphatidylinositol-3-kinase, several GTPase-activating proteins, and as an
exceptional case, the extracellular protein perforin. A common denominator of all these
proteins is their requirement for membrane attachment, although not in all cases regulated
by Ca2+. Most of these proteins are involved in the major intracellular signal transduction
pathways.
A detailed analysis of the C2-domain sequences, in combination with the
knowledge of the 3D-structure of one C2-domain, allows the determination of residues
important for the various functional aspects of this domain. This knowledge can in turn be
applied to predict the role of the C2-domain in several uncharacterized protein sequences
that have become available by the genome projects.
Application of sequence profiles is a very sensitive method for the discovery of distant sequence relationships. In contrast to conventional sequence comparison and database searching methods, not a single sequence is used as a query object but a profile constructed from a family of related sequences. These profiles are normally derived from multiple alignments of the initial sequence set. In addition to the sequences themselves, a profile contains the following information:
For the detection of distant relationships, the assessment of statistical significance is very important. We derive an estimation of the error probability from an empirical score distribution obtained with a randomized version of SwissProt (shuffling window 20). As can be seen from the figure below, in most cases the score distribution can be approximated by an extreme value distribution.
An additional advantage of the profile method is the possibility of iterative profile refinement. Sequences with highly significant similarity to the profile are aligned to the profile and included in the profile construction process for the next round of database searches.
Currently, we are constructing generalized profiles for the most important of the highly divergent protein domains and families. These profiles will be included in the PROSITE pattern collection. We are also trying to detect and describe previously unknown homology domains, with emphasis on proteins involved in signal transduction and regulatory pathways.
Brose, N., Hofmann, K., Hata, Y., Sudhof, T.C. (1995)
J. Biol. Chem. 270:25273-25280
Gribskov, M., McLachlan, A.D. and Eisenberg,D. (1987)
Proc. Natl. Acad. Sci. USA 84:4355-4358
Luthy, R., Xenarios, I. and Bucher, P. (1994)
Prot. Sci. 3:139-146
Thompson, J.D., Higgins, D.G. and Gibson, T. (1994)
Comput. Applic. Biosci. 10:19-29
Bucher, P. and Bairoch A. (1994)
in: Proceedings of the 2nd ISMB Conference, pp. 53-61, AAAI press.
Bucher, P. Karplus, K. Moeri, N. and Hofmann K. (1996)
Computers and Chemistry 20:3-24
Bairoch, A. , Bucher, P. and Hofmann, K. (1996)
Nucleic Acids Res. 24:189-196
The upper part of the figure shows the frequency distribution obtained by a run of the C2- profile against a randomized version of SwissProt.
[PostScript]
The lower part of the figure shows the decadic logarithm of the cumulative frequency
distribution of the C2-domain profile against the randomized database. (the cumulative
frequency value corresponding to the score 300 means the number of sequences giving
scores better than 300).
For most profiles, the graph can be approximated by a linear relationship in the high score
range.
[PostScript]
[PostScript]
[GIF]
[PostScript]
[GIF]
[PostScript]
[GIF]
[PostScript]
[GIF]
Left: The Protein Kinase C family.
Right: The PI-specific Phospholipase C family.
[PostScript]
[GIF]
[PostScript]
[GIF]
Left: Synaptotagmins and other proteins with multiple C2-domains.
Right: Miscellaneous proteins with C2-like domains.
We have created profiles for the following domains and protein families. Some of them are already part of PROSITE, the others will appear there soon. A set of programs for the comparison of sequences with collections of profiles will be available at the time of the next PROSITE release.
SH2 domain**, SH3 domain**, PH domain**, C1 domain**, C2 domain**, PID domain**, rasGAP domain*, rhoGAP domain*, rapGAP family*, rabGAP family, arfGAP family, cdc24-type rasGRF domain*, cdc25-type rasGRF domain*, rcc- repeat (ranGRF)*, rhoGDI-family, rabGDI-family
C3H2C3-type RING finger*, rsp5/WW-domain*, forkhead-associated (FHA) do- main*, polo-box*, death-box*, lipoxygenase appendage domain*, bromo-domain*, chromo-domain*, IQ-domain*, BTB-domain*, a-latrotoxin receptor interaction domain
forkhead-domain*, myb-domain*, ets-domain*, HMG (high-mobility group) do- main*, MCM (mini-chromosome maintenance) domain, KH-domain*
protein-kinase domain**, lipid kinase (PI3K) domain*, PI-specific PLC X-box** and Y-box**, bacterial PLC-domain, bacterial SMase, intracellular (plant-type) PLD, extracellular PLD, intracellular PLA2, HECT-domain (ubiquitin-transferase)*
leucin-rich repeat*, LRR-flanking regions*, TPR repeat**, wd40 repeat**, ankyrin repeat*, spectrin repeat *, gelsolin repeat *, filamin/ABP280 repeat
cub domain**, anaphylotoxin-domain**, saposin II-domain*, C-type lectin do- main*, thrombospondin type I domain*, archaebacterial surface layer repeat*
hsp20-family**, globin-family**, cpn10-family*, ricin-family*, IMB/FBP/IPP- family*
Profiles that are already part of PROSITE are labeled with (**), experimental profiles that are not yet part of PROSITE but can be searched using our WWW-server (http://ulrec3.unil.ch) are labeled with (*).