Entering edit mode
5.4 years ago
yp19
▴
70
Hi all!
I've run both InterProScan and HMMER because im interested in pfam hits for a set of proteins. Interproscan was run like so:
./interproscan.sh -i my_prot.faa -f tsv
and then i filtered the results for pfam hits with E value < 0.001
HMMER was run like so:
hmmscan --tblout hmmer_result.txt -E 0.001 Pfam-A.hmm my_prot.faa
I compared the outputs and for some reason, InterProScan finds less proteins with pfam hits than HMMER (approximately 150 proteins less). I checked and they are both using the most recent pfam database (32.0) so I'm not sure why this could be happening. Any ideas ?
Thank you!
Do you know what is the exact HMMER command InterProScan is using? Do you know if / how InterProScan filters input and output? You may have to dig InterProScan logs to find out these details.
Thank you. No I do not know the exact command. I figured the output was not filtered since I have some large evalues (e.g. 40). Do you know where I can find these logs? I made it this far https://github.com/ebi-pf-team/interproscan/tree/master/core but i'm not sure where to go from here.
My guess is that the multiple hypotheses correction is different, probably interproscan scans more profiles and has a more profound correction. Can you validate the correspondence between the e-values? Are you loosing the high e-value results?
Thank you for the suggestion. Yes it looks like I am losing the high e-value results. although, there are only 30 of these proteins with large (>0.001) evalues and I am missing 150 proteins in total (in comparison to HMMER)..... Perhaps there is some filtering on the evalues that interproscan is doing (after multiple testing)
On this basis, you should be able to reduce the correction that the inerproscan results are subjected to by confining your search to pfam libraries only ?
for example
./interproscan.sh -appl Pfam -i /path/to/sequences.fasta -f tsv
Please do not delete posts. The purpose of this site is two-fold: more immediately, to help people with their questions; but on the long run, to serve as a repository of knowledge. The second purpose is defeated if people delete their questions.