What Is The Best Way To Find Core Genome Of Many Bacteria.
2
0
Entering edit mode
12.0 years ago
Naren ▴ 1000

There are two ways of finding core genome in bacteria.
Suppose we have five genomes. (of course I need it for around 30 genomes)

Method 1:
Blast genome 1 and 2. then blast this core set with next genome i.e. 3
then blast this core set with next genome i.e. 4 and so on...

Method 2:
Blast genome 1 with each 2,3,4 and 5.
then just find which genes are common between outputs 1-2, 1-3, 1-4, 1-5.....(using some gi id comparison program in perl. that I have.)

*What among the above methods seems more accurate and time saving, if done for 30 genomes??? *

genome • 3.5k views
ADD COMMENT
3
Entering edit mode
12.0 years ago
Neilfws 49k

Can I suggest - again, as for your previous question - that CD-HIT might be a good tool for this task? A quick Google search for "core genome" + CD-HIT suggests that I'm not the first to have this idea.

ADD COMMENT
0
Entering edit mode

+1 for CD-HIT with the caveat that CD-HIT is for clustering sequencing reads for operational taxonomic units, but if we think of predicted genes (ORFs in this case? -- not sure what a "core set" is in this example? genes? syntenous regions?) you could cluster "hits" into commonly shared and unique to each genome. Not sure how long that would take computationally as I've only used CD-HIT on amplicon data of a single gene family.

ADD REPLY
0
Entering edit mode

I guess I would just take all predicted ORFs in fasta format, concatenate and throw into CD-HIT, then parse the output and link members of each cluster back to their organism.

ADD REPLY
0
Entering edit mode

+1 for CD-HIT....

ADD REPLY
2
Entering edit mode
12.0 years ago
JC 13k

I don't see differences using any of the two methods proposed to obtain a list of core genes, because you keep constant the search space, which defines the score and Evalue for Blast, but of course, maybe anyone else can differ. In any case, Method 1 will be faster, because in each iteration you're removing elements in your search to the next genome.

ADD COMMENT

Login before adding your answer.

Traffic: 2316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6