Question

Pipelines/packages for protein domain annotation?

0

Entering edit mode

8 months ago

John • 0

Hi, I'm currently generating new bacterial genomes and I want to identify all ankryn domain containing proteins (and the number of repeats they have). What are some approaches I can use for this?

Due to the scale I don't want to use web tools, and I've been looking at the hmmer documentation which looks quite complicated to parse. So I'm hoping there is a simpler way.

Ideally, I'd like to process a gbff file, but I can always convert between file types.

Thank you!

Note: I can't just extract this from the annotation information as bakta, the package I'm using doesn't provide it for all cases.

annotation protein domains • 351 views

ADD COMMENT • link 8 months ago by John • 0

score 1 · Answer 1 · 2024-03-14

What exactly is so difficult in the HMMer documentation? Individual preferences vary, but I always thought that HMMer had one of the cleanest manuals out there.

There is no simpler way to annotate the presence of a single domain in a protein database than to use hmmsearch. As you found out, using automatic annotation tools like prokka or bakta has its own difficulties. What could be simpler than:

hmmsearch -E 0.01 -o output_file.txt ank.hmm protein_database.faa

In order, I am setting an E-value threshold here, an output file with search results, the HMM name that will be used for searching and a protein database. There are other options, such as setting a larger number of CPUs to speed up the search, but the above command is all it takes. What's left for you is to find an ankyrin domain HMM and off you go.