ISREC Profile Homepage


Overview

Application of generalized profiles is a very sensitive method for the discovery of distant sequence relationships. In contrast to conventional sequence comparison and database searching methods, not a single sequence is used as a query object but a profile constructed from a family of related sequences. These profiles are normally derived from multiple alignments of the initial sequence set. In addition to the sequences themselves, a profile contains the following information: In collaboration with Amos Bairoch in Geneva, we are currently creating profiles of various protein domains that are being incorporated into the PROSITE pattern library. For this purpose, we created a new, generalized profile format containing much more parameters than the previous one. A new set of profilesearch-programs can take advantage of these new parameters and allows more sensitive searches and also novel types of searches.
For a detailed description of this format and related topics see the documents below.

Selected references

The original profile method:

Improvements to the profile method:

The generalized sequence profiles:

The PROSITE pattern library:
For various applications of the generalized profile technique, see out publication list and check the documents listed below.

Documents on generalized profile syntax and methods

The syntax of profiles in PROSITE
This document is part of the current PROSITE release. It contains a detailed description of the format and provides all information needed for writing programs that read or write the new format. Note, however, that we also have released a set of free programs that do sequence comparisons and database searches with profiles in the new format. This program package also contains portable routines for reading and writing of the new format that can be used in other programs as well.

PROSITE users manual
This document, written by Amos Bairoch, explains all the information stored in PROSITE and how they can be used.

Methods for the construction of profile entries for the PROSITE database
(K.Hofmann and P. Bucher, 1995). Poster presented at the 3rd International Conference for Intelligent Systems in Molecular Biology, Cambridge/UK, July 1995. This documents explains, how the generalized profiles in the PROSITE database are constructed. Issues like iterative profile refinement and profile scaling are briefly discussed.

Normalized profile scores
This document deals with the assessment of the statistical significance of matches found by the profilesearch methods. Application of the 'normalized profile score' (NScore) is explained.


A collection of posters on profile applications

Benefits of a Generalized Profile Syntax for Biomolecular Sequence Motifs
(K.Hofmann and P. Bucher, 1994). Poster presented at the 3rd conference on Genes, Proteins and Computers, Chester/UK 1994. This poster is also available in compressed Postscript format. It contains a description of the advantages of profile-based database searches. As an example, the detection of sequence similarity between inositol-monophosphatase, fructose-1,6-bisphosphatase and inositol polyphosphate 1-monophosphatase is demonstrated.

Detection and Analysis of Distantly Related C2-like Membrane Attachment Domains
(K.Hofmann and P. Bucher, 1995). Poster presented at the 1st European Protein Society Meeting, Davos/CH 1995. This poster is also available in compressed Postscript format. The generalize profile method is used to demonstrate the occurence of C2-like domains in proteins like the novel PLC isoforms, phospholipase C, cytosolic phospholipase A2, perforin, and many more.

Conserved sequence domains in cell cycle regulatory proteins
(K.Hofmann and P. Bucher, 1996). Poster presented at the joint ISREC/AACR meeting "Cancer and the Cell cycle", Lausanne/CH January 1996. This document shows several examples of weakly conserved domains in cell cycle regulatory proteins, which have been detected using the profile method.


Profile-related software

ISREC ProfileScan Server
(Search a the profiles-entries in PROSITE with your sequence). This is an experimental implementation of the pfscan program. The profile-entries contained in PROSITE, recognizable by the keyword MATRIX, can be searched with a single, user-supplied sequence. Major new data release and Pfam now searchable!

Download the pftools package
The pftools package contains programs for generalized profile applications. The source code in FORTRAN77 and executables for various platforms are available. The current release 1.0 contains the programs pfsearch, pfscan, and GtoP. Problems should be reported to Philipp Bucher, the author of the package. Pftools 2.0 now available!

People who are interested in getting more information on profiles or who would like to contribute profiles or good multiple alignments of protein domains should contact Philipp Bucher or Kay Hofmann


For getting more information on PROSITE, visit the PROSITE homepage in Geneva.
This document was last modified on

Go to the ISREC-bioinformatics home page