8.1 Scope

hit_query returns a list of hits, just as they are stored in the database. A hit is where a match by a motif is located on a sequence, and every hit is attributed a ’signif’ string to help categorize it - in the case of a simple pattern search, it is always “!”:

 

> hit_query seq_name=sw:THIO_HUMAN 
  sw:THIO_HUMAN  24   42  !  pat:THIOREDOXIN_1 
  sw:THIO_HUMAN   1  102  !  prf:THIOREDOXIN_2 raw_score=51

A discussion of how signif can be attributed is available online at http://myhits.isb-sib.ch/doc/scores.html. Note however that these settings are valid only for the MyHits Web site, other settings can be used if more appropriate.

 


PIC

Figure 8.1: Hit logic.

 


8.2 How does it work?

To understand how hit_query is designed, consider a dataset with a number of hits as depicted in fig. 8.1. The set contains 4 sequences, 4 motifs, and 7 hits on these. Note that seq:3 has two hits by motif mot:3. The following command

 

> hit_query seq_name=seq:1,seq:2 and_seq_name=seq:2,seq:3

yields the result shown in fig. 8.2: only seq:2 is “retained”, since it is the only sequence matching both conditions (remember: the query constraints use AND logic).

 


PIC

Figure 8.2: Using query constraints: the logic of hit_query seq_name=seq:1,seq:2 and_seq_name=seq:2,seq:3

 


Similar to this, the following query works on the motif level:

> hit_query mot_name=mot:1,mot:2 and_mot_name=mot:2,mot:4

with the result shown in fig. 8.3.

 


PIC

Figure 8.3: Query constraints, continued: The logic of hit_query mot_name=mot:1,mot:2 and_mot_name=mot:2,mot:4

 


Hit lists can even be combined among them. As an example, the two left-hand images in fig. 8.4 depict two sets of hits that were identified by two different queries - let’s call them $X and $Y. By combining them, one obtains a new, even more targeted hit list, shown in the rightmost graphic in fig. 8.4:

> hit_query hit_list=$X and_hit_list=$Y

 


PIC
Figure 8.4: Combining two hit lists: the logic of hit_query hit_list=$X and_hit_list=$Y

 


8.3 Options

As an overview, the following constraints are supported:

seq_source=...
A non-empty list of sequence database names.
seq_name=...
A list of sequence entry names (given explicitly, or implicitly using query identifiers) to be included in the results.
and_seq_name=...
A list of sequence entry names to be included in the results (logical AND with the previous constraint).
not_seq_name=...
A list of sequence entry names to be excluded from the results (logical NOT to restrict the two previous constraints).
mot_source=...
A non-empty list of motif database names.
mot_name=...
A list of motif entry names (given explicitly, or implicitly using query identifiers) to be included in the results.
and_mot_name=...
A list of motif entry names to be included in the results (logical AND with the previous constraint).
not_mot_name=...
A list of motif entry names to be excluded from the results (logical NOT to restrict the two previous constraints).
hit_name=...
A hit list given using query identifiers to be included in the results.
and_hit_name=...
A hit list to be included in the results (logical AND with the previous constraint).
not_hit_name=...
A hit list to be excluded from the results (logical NOT to restrict the two previous constraints).
signif=...
A list of hit signif to be included in the results.
not_signif=...
A list of hit signif to be excluded from the results.
-lim=...
Maximum number of rows to be returned.
-ref=...
A query identifier, i.e. a string that starts with "$" followed by a letter, possibly followed by more letters, digits or underscores. This is how a query can be saved to be re-used later in other operations. When supplied, this option prevents the query to be executed.