custom database in HH-suite
1
0
Entering edit mode
5.5 years ago
yp19 ▴ 70

Hi all!

Sorry in advance if this is a beginner question. I'm trying to make a custom database in HH-suite. I downloaded all virus orthogroups and their sequences. My question is, when I go to make the database should I use all the ortholog groups and their sequences together OR make one database per orthogroup? My end goal is to use this database in hhblits and my query is multiple sequence alignments of ortholog groups of a subset of virus species.

It will be important for me to know which proteins in my query are significant matches with which virus orthogroups, is this information retained if I make the database using all orthogroups and their sequences together?

Thanks for any advice!

homology domain hh-suite HMM orthogroup • 1.6k views
ADD COMMENT
0
Entering edit mode

Please do not delete posts. The purpose of this site is two-fold: more immediately, to help people with their questions; but on the long run, to serve as a repository of knowledge. The second purpose is defeated if people delete their questions.

ADD REPLY
1
Entering edit mode
5.5 years ago
Joe 21k

I think you can just make a single database if I’ve understood the problem correctly.

The database will simply return the best matches, wherever they come from.

If you are concerned about separating the orthogroups, but your input query data is already separated, your results are naturally clustered according to the input data files, and you can filer/post process the data however suits.

ADD COMMENT
0
Entering edit mode

Thanks! Yeah, I was just worried about losing the orthogroup information, if i create the database using all proteins. I would just use 'cat' to join all the fasta files from each orthogroup and then use that file to make the database.

Also the MSA step in making the database would be done overall sequences rather than just sequences per orthogroup, and I wasn't sure if that was ok/what people normally do.

ADD REPLY
1
Entering edit mode

Ah I see, I think I may have misunderstood in the first instance.

If all your input files are bundled together, then you will probably want to use separate orthogroup databases. This is probably not the most efficient approach, but it probably is the easiest.

I’m not super familiar with making custom databases, but depending on how HHsuite stores the sequence names etc, it may be possible to format your database sequences with the name of the orthogroup in the fasta headers, in which case you should be able to see it when a hit comes up in hhblits. You’d need to consult the manual to know whether this is an option.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion! I will try to format headers and report back.

ADD REPLY

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6