Question

Building Ensembl database for HMMER

0

Entering edit mode

9 months ago

J • 0

Hello,

I am wanting to run HMMER (specifically JACKHMMER) on my local workstation as the EBI website (https://www.ebi.ac.uk/Tools/hmmer/) is becoming increasingly unstable, particularly for JACKHMMER searches. I have it installed and tested and is working fine.

What I want to do is to be able to run a search on the whole Ensembl Genomes Bacteria database, as you can from the website. I have downloaded the FASTA formatted protein sequences for a test batch of bacterial genomes from the Ensembl FTP server (https://ftp.ebi.ac.uk/ensemblgenomes/pub/bacteria/release-58/) but what I don't understand is how to build a database to search through all of them.

In the HMMER tutorial, it seems I just specify my query protein sequence and then the database in .fasta format. So my questions are two:

How do I build a database to provide as the argument for the jackhmmer command - do i simply concatenate every FASTA file I have from each genome assembly together into a single file?
What is the best way to do this so I can have easier access to the metadata like taxid and organism name, or do I just have to parse the output from HMMER and re-search the raw data from ensembl myself to get all this?

Many thanks!

hmmer • 250 views

ADD COMMENT • link updated 9 months ago by Ram 44k • written 9 months ago by J • 0