Question

Speed of hmmsearch

1

Entering edit mode

9.4 years ago

kentnf ▴ 10

Hi,

I am using hmmbuild to build a HMM with 89 domains, and searching all arabidopsis proteins against the 89 domains using hmmsearch :

hmmsearch -Z 2000000 --domZ 89 --cpu 20 -o output.hmmsearch.txt stockholm89.hmm at_pep

It just cost less than 2 minutes to finish the search.

To speed this search, only 1300 interesting proteins were selected to perform the searching using the same command. But it takes about 20minutes.

I use the latest version of HMMER. Does any known the problem about it? Is it a bug for the hmmsearch program?

Thanks

software error • 3.3k views

ADD COMMENT • link updated 9.0 years ago by Biostar 20 • written 9.4 years ago by kentnf ▴ 10

0

Entering edit mode

Are you saying that this took 2 minutes:

hmmsearch -Z 2000000 --domZ 89 --cpu 20 -o output.hmmsearch.txt stockholm89.hmm at_pep_N_seqs

While this took 20 minutes:

hmmsearch -Z 2000000 --domZ 89 --cpu 20 -o output.hmmsearch.txt stockholm89.hmm at_pep_1300_seqs

While at_pep_1300_seqs is a subset of 1,300 sequences from X sequences of at_pep_N_seqs? If yes, that sounds really weird.

BTW, if you have enough RAM and fast I/O then the below can be a lot faster than what you're doing. Split the input file into 20 parts and then (GNU parallel has to be in $PATH):

function hmmer() {
    n=$(basename "$1")
    hmmsearch -Z 2000000 --domZ 89 --cpu 1 -o $1.output.hmmsearch.txt stockholm89.hmm $1
}

export -f hmmer
find /where/the/split/files/are/ -maxdepth 1 -type f -name "*specific2splitFiles" | parallel -j 20 hmmer {}

ADD REPLY • link 9.0 years ago by 5heikki 11k