MyHits documentation

The PSI-BLAST program provides an effective technology to deal with protein domains. PSI-BLAST is fast, robust and relatively easy to use. However, PSI-BLAST is less sophisticated than other programs based on the profile-HMM technology (PFTOOLS, HMMER), especially in treatment of gaps. PSI-BLAST is excellent for detecting domains and establishing homology, but sometimes produces alignments of a sub-optimal quality.

1 - PSI-BLAST tutorial

Read and try the PSI-BLAST tutorial of the MyHits web site, where the PSI-BLAST program is used in conjunction with other pieces of software (e.g. Jalview). The NCBI also provides a tutorial for its own web interface.

2 - Iterative Training

Execute four cycles of PSI-BLAST using the sequence below as initial query

>sw:ERCC5_XENLA/27-95
LAVDISIWLNQAVKGARDRQGNAIQNAHLLTLFHRLCKLLFFRIRPIFVFDGEAPLLKRQTLAKRRQRT

Restrict your search to the Swiss-Prot database. Simply launch the next iteration with the ...next cycle option. At every cycle, record the number of matches equal or below the threshold and the E-values produced by the protein ERCC5_XENLA, FEN1_HUMAN, DIN7_YEAST. Complete the table below and explain what you observe.

	#matches	ERCC5_XENLA	FEN1_HUMAN	DIN7_YEAST
Cycle 1
Cycle 2
Cycle 3
Cycle 4

3 - Building a "Model" for the Thioredoxin Domain

Using PSI-BLAST, retrieve all homologs of the human thioredoxin protein THIO_HUMAN in Swiss-Prot. The "brute force" approach is the following one:

Set the Let PSI-BLAST cycle until convergence checkbox to enable the automated iterative mode.
Set the Cluster matches at approx. identity level checkbox to 50 % to prevent the list of matches from growing too large.
Send the resulting alignment to the MSA hub.

Using MAFFT, re-align the matched sequences. Indeed the alignment of the sequences obtained from the automated iterative mode of PSI-BLAST has accumulated many small errors of alignment.
Send the resulting alignment to the MSA hub.

Using Jalview2, trim the extremities of the MSA, i.e. remove these highly gapped regions where the alignment quality is poor. They are just not interesting.
Send the resulting alignment to the MSA hub (File -> Output to Browser).

You now have in your hand an MSA made of a set of sequences that are (hopefully) representative of the diversity of the thioredoxin sequence. You can view it as a kind of "model" of the thioredoxin domain. Save it on your side (in FASTA format) for future use.

At this stage, you may think about improving your model by adding or removing sequences, or by manually editing the alignment. This however requires some biological expertise.

4 - Exploiting the Thioredoxin "Model"

To look at all the human proteins with a thioredoxin domain found in SwissProt, paste your MSA in the PSI-BLAST query form and

Preselect the SwissProt database.
Set the Taxonomic restriction to human.
Set the Cluster matches at approx. identity level checkbox to 90 %.

Now pay attention to the graphics that appear in the output. You should be able to recognize the thioredoxin active site C-x-x-C, a redox-active disulphide bridge. By using the "graphics" links, look into the detail of the annotated alignments: particularly interesting ones are those three sw:THIO_HUMAN, sw:PDIA1_HUMAN and sw:TXN4A_HUMAN

Domain Hunting tutorial