Question

Identify multidomain protein after hmmscan

0

Entering edit mode

10.5 years ago

dago ★ 2.8k

I have some trouble to select an appropriate criterion to identify the presence of multiple domain in proteins.

I perform an hmmscan search of a list of protein selecting the flag --tblout

The output reports several fields:

--- full sequence ---- --- best 1 domain ---- --- domain number estimation ----# ..E-value  score  bias   E-value  score  bias   exp reg clu  ov env dom rep inc...
------------------- ---------- -------------------- ---------- --------- ------

Reading the manual I think that the first value to check if the E value of both full sequence and Best 1 domain. If the second is significant lower the the E value of the fill seq the results for this protein should be carefully considered.

I also understand that the resulting domains are in order of statistic significance. So the first one, is more likely there. Now, I have some problem to understand what parameter to consider for deciding if I am dealing with a multi-domain protein or not. Should I consider just the "exp" value?

hmmscan protein-domain • 2.5k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by dago ★ 2.8k

Ram · Answer 1 · 2015-03-13

Just do the hmmscan with individual family profiles (the most significant one and the next to it is enough) without --tblout flag (and if you previously used --noaliremove that also), if you find any continuous gap in the alignment with first profile and that gap is filled with the second profile, that is a multidomain protein. If the protein is multi domain one, the two families consisting those domains are come in result as first and second almost in every case.