How to keep the top hits only in the output file of hmmscan?
1
1
Entering edit mode
4.3 years ago
A_heath ▴ 170

Hi all,

I recently downloaded HMMER to use hmmscan locally in Linux with a Pfam database. It works great, however the ouput files are quite difficult to read quickly in my opinion...

I tried using output options such as: --tblout, --domtblout, --pfamtblout, etc. but the ouput files are still voluminous.

I would like to keep only the top hits in my output files.

I've seen that it was possible with hmmsearch so I was wondering if there was something similar with hmmscan... Ideally, I would want an output as I could find online: cf. here

If you have any suggestions, I'll gladly took them. Thank you in advance for your very appreciated help!

hmmer hmmscan • 3.8k views
ADD COMMENT
4
Entering edit mode
4.3 years ago
A_heath ▴ 170

For anyone interested, I figured it out using this amazing resource: http://slhogle.github.io/2015/remove-duplicate-lines/ and the option --tblout of hmmscan.

I did:

hmmscan --tblout output_file.pfam Pfam-A.hmm seq_file.fasta

and then:

awk '!x[$3]++' ouput_file.pfam > MYBESTHITS.pfam

MYBESTHITS.pfam file is basically what I got online with the top hit for each protein sequences.

ADD COMMENT
3
Entering edit mode

Hey! You mentioned you know how to find the top hits from hmmscan. Could you share how you do this?

ADD REPLY
2
Entering edit mode

I'll leave a clarification. Since the link doesn't work.

Because hmmscan --tblout generates a table with the group of hits at the top with the best score, awk just leaves the first top one.

ADD REPLY

Login before adding your answer.

Traffic: 2090 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6