Question

Comparative Microbial Genomics

8

Entering edit mode

12.3 years ago

lgbi ▴ 150

Hi all,

Our lab has sequenced 20 different strains of the same bacterial species. I now want to analyse these data, i.e., link genomic data to virulence (some of the strains are virulent, others aren't) and other phenotype differences.

I started with de novo assemblies and mapping to an annotated reference genome. However, it is unclear to me how to proceed now in a good way. Most of the information for comparative genomics I find on the web applies only to the human genome or one-to-one comparisons of bacterial genomes.

Is there a standard workflow for this kind of analysis? Does anybody know of any good tutorials/references/articles? How would you proceed?

comparative genomics • 5.7k views

ADD COMMENT • link updated 5.8 years ago by Biostar 20 • written 12.3 years ago by lgbi ▴ 150

1

Entering edit mode

I have toggled the two most upvoted answers as the accepted answers as OP has not been seen in 5+ years.

ADD REPLY • link 5.8 years ago by Ram 44k

0

Entering edit mode

Multiple genome alignments (e.g. with mauve) may give you insights about syntenic regions and recombination events which are mostly related the pathogenic potential.

ADD REPLY • link 8.7 years ago by Naren ▴ 1000

1

Entering edit mode

12.3 years ago

RossCampbell ▴ 140

Sam Karlin did a lot of work developing pretty basic mathematical methods to identify pathogenicity islands in bacteria. One of his reviews can be found here.

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 12.3 years ago by RossCampbell ▴ 140

1

Entering edit mode

12.3 years ago

Biesterfeld ▴ 30

I am not sure if I have understood your question exactly, but you could have a look at OGeR. Maybe this fits your needs.

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 12.3 years ago by Biesterfeld ▴ 30

1

Entering edit mode

12.0 years ago

Nikolay Vyahhi ★ 1.3k

SiBELia (Synteny Block ExpLoration tool) is able to compare multiple bacterial genomes and visualize it in a useful way, see description or code.

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 12.0 years ago by Nikolay Vyahhi ★ 1.3k

1

Entering edit mode

8.7 years ago

dago ★ 2.8k

I think there are several things you could do and it is hard to point out all of them here.

In the first place I would look at the phylogenetic relationship of the strains, meaning 16S, core genome phylogenesis, SNP.

This imply that you would need to calculate core and pan genome as well. This analysis will also give you good information on the gene uniquely present in specific strains. Looking into them can maybe give you some insight into potential uniq features, e.g. virulence.

Then I would look at those system involved in virulence: secretion systems (II, III, IV, VI, VII) and also their effectors. This are crucial in establishing a relationship with other bacteria and eukaryotes. It could be that the virulent strains have specific effectors that the other do not have.

There are then specific db for virulence factors, you could take this proteins and blast them against your genomes. I would not recommend this approach thought, because you get many false hits.

Finally, you could see the dNdS among the strains, maybe the virulence strains have specific genes under positive selection compared to the other. You can apply this only to orthologous genes and you should be careful if the strain are to close to each other. Often strains that are too closely related show higher dNdS than aspected. It seems that this is not a real effect of positive selection, but rather a bias due to the little divergence time between the strains.

Hope this helps

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 8.7 years ago by dago ★ 2.8k

0

Entering edit mode

8.7 years ago

Naren ▴ 1000

This article may help you.

ADD COMMENT • link 8.7 years ago by Naren ▴ 1000

Ram · Accepted Answer · 2012-07-23

I don't think there is a single best standard approach to this, but Bakker et al. did a comparative genomics approach on Listeria with the goal of identifying virulence genes. You could simply try to replicate their methods on your data.

The main focus seems to be on the gene level, identifying a subset of virulence related candidate genes, and then look for whole gene deletions. Thereby I would also include biological knowledge on bacteria in general and knowledge about this specific species, as to what are good candidate sets of genes (e.g. Type III secretion system, etc.). The aim is to identify if there are differences in the inventory of genes.

In addition and in case, the whole gene deletion/insertion approach does not yield good candidate genes, I would run the whole raw data through a variant calling pipeline and look at smaller variation like small insertions or deletions, frame-shifts, mutations in CDS and promoter sequences. To accomplish this a standard pipeline (aka. BWA into samtools) could be used. You can then check variants common for certain phenotypes and check their effect on the protein level. Further, looking for non-coding elements might be interesting too.

Ram · Accepted Answer · 2012-07-23

Part of my thesis was on comparative genomics to get closer to uncovering the genomic basis of virulence in Neisseria meningitidis. I think that two good papers to look at are Schoen et al 2008 and my own paper Katz et al 2011.

Schoen compared genomes on the gene level; my group looked for SNP-level differences. For the genome project part of it, ie assembly-to-annotation, we used CG-Pipeline. However, a more interactive service like RAST or BASys might be easier (although less comprehensive in annotations). NCBI's PGAAP annotations however would make it easiest to submit the genomes to NCBI at the end of the study.

Anyway, I think that the methods in Schoen et al and Katz et al would be very relevant to your questions.