Understanding hhblits output
1
0
Entering edit mode
5.5 years ago
mnsp088 ▴ 100

Hi everyone,

I just ran my first hhblits (hhblits -cpu 4 -M first -i MSA/g_1.fa.out -d my_databases/my_db) and I noticed there are multiple hits to the same cluster in my results file (for e.g. see column 2 below). I'm guessing this represents different domains with homology to my query MSA that are all significant, but i wanted to double check if this makes sense. Anyone run this before and seen a similar output?

 No   Hit          Prob   E-value P-value  Score  SS  Cols  Query HMM  Template HMM
  1 cluster_id_124 100.0   1E-42 6.7E-46  242.0   0.0  201   13-221   101-350 (396)
  2 cluster_id_124 100.0 1.6E-42   1E-45  241.0   0.0  202    7-219    48-261 (396)
  6 cluster_id_124 100.0 9.2E-37 6.1E-40  211.5   0.0  198   11-218   142-391 (396)

Also, my database is made up of ~2k HMMs, why then in the output results file, I see that there is only 136 searched HMMs?

Query         g_1
Match_columns 229
No_of_seqs    1529 out of 22987
Neff          11.9485
Searched_HMMs 136

Thank you for any input.

hhsuite hhblits homology HMM • 3.3k views
ADD COMMENT
1
Entering edit mode

Is this from a custom database?

The output looks reasonable at a glance, but I’ve not seen cluster_id_xxx before. I typically use hhsearch too, so there could be some difference in the program that I’m not accounting for.

I usually run my searches against the PDB, so I get PDB hits back.

ADD REPLY
0
Entering edit mode

Yes, this is from a custom database. Each HMM in my database is produced from a multiple sequence alignment of an ortholog group.

Do you also see duplicate hits when you used PDB?

ADD REPLY
1
Entering edit mode
5.5 years ago
Joe 21k

Yep its quite common to get multiples with PDB, this can be because theres multiple internal matches within a sequence (e.g. repetitive spans) or multiple domains.

It’s also quite common to have the same PDB ID come up, if matching to structures with multiple similar or identical chains, e.g. match 1 might be PDB ID 123A chain A, and 2 might be PDB ID 123A chain B, but both would come up as 123A.

ADD COMMENT
0
Entering edit mode

That makes sense, thank you for the explanation jrj.healey !

ADD REPLY

Login before adding your answer.

Traffic: 1833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6