Hello,
I am wanting to run HMMER (specifically JACKHMMER) on my local workstation as the EBI website (https://www.ebi.ac.uk/Tools/hmmer/) is becoming increasingly unstable, particularly for JACKHMMER searches. I have it installed and tested and is working fine.
What I want to do is to be able to run a search on the whole Ensembl Genomes Bacteria database, as you can from the website. I have downloaded the FASTA formatted protein sequences for a test batch of bacterial genomes from the Ensembl FTP server (https://ftp.ebi.ac.uk/ensemblgenomes/pub/bacteria/release-58/) but what I don't understand is how to build a database to search through all of them.
In the HMMER tutorial, it seems I just specify my query protein sequence and then the database in .fasta format. So my questions are two:
- How do I build a database to provide as the argument for the jackhmmer command - do i simply concatenate every FASTA file I have from each genome assembly together into a single file?
- What is the best way to do this so I can have easier access to the metadata like taxid and organism name, or do I just have to parse the output from HMMER and re-search the raw data from ensembl myself to get all this?
Many thanks!