The following databases are required for the exercises:
- the Swiss-Prot protein sequences, designated with sw
- the Prosite protein motifs pat
- Virtual motifs derived from the ’FT’ lines of the Swiss-Prot entries ft
- the NCBI taxonomy taxid
At least two hit lists are calculated: Swiss-Prot vs the PROSITE patterns and Swiss-Prot vs the ft motifs. Concerning the taxonomy, Swiss-Prot was mapped onto the NCBI taxonomy and this is referred below as a ’hat’ list.
It is likely that several other databases are available from the same site, too. The above databases are updated on a regular basis, and HitKeeper is meant to track these changes: As a consequence, the result of the exercises are expected to change over time.
Use the following commands to explore which data are currently stored in the database. Multiple commands on a single line must be separated with a semicolon ’;’.
cla_list -format=xml # case you prefer xml
seq_info sw; seq_info seq_source=sw # alternative syntax; same result
mot_info pat ft;
hit_info seq_source=sw mot_source=pat,ft # takes a while
hat_info seq_source=sw cla_source=taxid # takes a while