The PROSITE Pattern syntax is described in details in the User Manual. Here is a summary:

  • The standard IUPAC one-letter codes for the amino acids are used.
  • The symbol 'x' is used for a position where any amino acid is accepted.
  • Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'. For example:[ALT] stands for Ala or Leu or Thr.
  • Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met.
  • Each element in a pattern is separated from its neighbor by a '-'.
  • Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: x(3) corresponds to x-x-x, and x(2,4) corresponds to x-x or x-x-x or x-x-x-x.
  • When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol.
Example type C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C in the text area of the pattern_search form (spaces, tabs, and line breaks are ignored), select the Swiss-Prot checkbox and click the search button. This produces two lists:
(i) all the Swiss-Prot proteins that contain one or more occurrence of the pattern.
(ii) an exhaustive list of all sub-sequences matched by the pattern.
The pattern of this example can be translated as Cys-any-Cys-any-any-[Gly or Pro]-[Phe or Tyr or Trp]-'four to eight any'-Cys. Actually it corresponds to the pattern pat:EGF_2 and produces quite a lot of matches!
Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N.
PROSITE, a protein domain database for functional characterization and annotation.
Nucleic Acids Res. 2010 Jan; 38(Database issue):D161-6.