3 Data used in this tutorial

The following databases are required for the exercises:

  • the Swiss-Prot protein sequences, designated with sw
  • the Prosite protein motifs pat
  • Virtual motifs derived from the ’FT’ lines of the Swiss-Prot entries ft
  • the NCBI taxonomy taxid

At least two hit lists are calculated: Swiss-Prot vs the PROSITE patterns and Swiss-Prot vs the ft motifs. Concerning the taxonomy, Swiss-Prot was mapped onto the NCBI taxonomy and this is referred below as a ’hat’ list.

It is likely that several other databases are available from the same site, too. The above databases are updated on a regular basis, and HitKeeper is meant to track these changes: As a consequence, the result of the exercises are expected to change over time.

Use the following commands to explore which data are currently stored in the database. Multiple commands on a single line must be separated with a semicolon ’;’.

 

seq_list;  mot_list 
cla_list -format=xml                     # case you prefer xml  
hit_list;  hat_list 
seq_info sw; seq_info seq_source=sw      # alternative syntax; same result 
cla_info taxid 
mot_info pat ft; 
hit_info seq_source=sw mot_source=pat,ft # takes a while 
hat_info seq_source=sw cla_source=taxid  # takes a while