Question

What Do These Syntenic Dot Plots Mean?

0

Entering edit mode

10.9 years ago

Jeff Wintersinger ▴ 60

Hello!

I am attempting to locate synteny between two recently released genomes for the parasitic nematode Haemonchus contortus. I expect pervasive synteny, as though the genomes correspond to divergent strains (one is an inbred laboratory strain, while the other is an African field isolate) they are nevertheless the same organism. To do this, I've tried to construct a syntenic dot plot.

One genome is 370 Mb, while the other is 320 Mb. My task is complicated by the fact that both genomes are of draft quality, meaning that I must compare the 26,000 contigs of one with the 14,400 contigs of the other, rather than the seven chromosomes of Haemonchus.

To construct the syntenic map, I've tried five programs. SyMAP, Mauve, and LAST's last-dotplot.py have all failed with one problem or another. SynMap (note it's distinct from SyMAP) and MUMmer both worked, but have produced rather peculiar output. While I have scant understanding of syntenic dotplots, the images produced (included at the end of this posting) seem to indicate terribly little synteny. I find this peculiar -- using the same tools, I've tried comparing C. briggsae and C. elegans, which demonstrate a great deal more synteny despite being (presumably) much more distant from each other than my two Haemonchus strains.

My questions are thus:

What do the syntenic maps indicate? What exactly does the positioning of each dot in my dot plots demonstrate?
Is it all biologically plausible that there is really so little synteny preserved between my two samples? As part of my analysis, I've also tried comparing the strains' respective proteomes using InParanoid. Though the two genomes bear a comparable number of annotated genes (21,800 in one, 23,600 in the other), I again saw much more divergence than I expected -- only 35% of the genes in each bore an orthologue in the other genome. The same analysis on C. briggsae and C. elegans found orthologues for 65% of the genes in each genome.
Are my problems perhaps a result of my comparing tens-of-thousands of scaffolds in each genome against each other, rather than a small number of chromosomes? I've considered comparing only the 100 (or 1000) largest scaffolds from each genome to reduce the demands I'm making on my tools, but this would likely destroy whatever hope I have of making a valid comparison, given that these would encompass substantially different potions of the respective assemblies. The largest 100 scaffolds compose only 10% and 7% of the respective genomes, while the largest 1000 scaffolds compose only 46% and 36%.

I will much appreciate any help. Thanks!

SynMap yielded this: SynMap

MUMmer yielded this: MUMmer

• 7.2k views

ADD COMMENT • link updated 10.9 years ago by Josh Herr 5.8k • written 10.9 years ago by Jeff Wintersinger ▴ 60

2

Entering edit mode

I think you need to filter so that you're only testing the largest contigs. If I understand correctly. Each square in that figure is a pair of chromosomes so the squares are too small to see any synteny.

ADD REPLY • link 10.9 years ago by brentp 24k

score 3 · Accepted Answer · 2014-02-04

As someone who works with Eukaryote genomes even larger than the ones that you are working with, you shouldn't have any problem with the tools you mentioned in theory -- particularly MAUVE and SYMAP. I would follow brentp's advice in the comment above and filter your analysis testing using only the largest contigs.

Better yet would be to try to close gaps and create supercontigs with what you have -- you'll have to use a reference -- looks like there are a few Nematode genomes to try to "seed" or anchor your contigs. There are a few tools -- CONTIGuator (it works for eukaryotes too) and PAGIT are just two examples -- which can be used to close contigs before trying to analyze your unfinished draft genomes. You may have to design primers and do some Sanger sequencing to close gaps also.