Detecting Polyploidy Using An Assembly
5
5
Entering edit mode
13.2 years ago
Fabian Bull ★ 1.3k

A biologist once asked me if it is possible to detect polyploidy from an assembly. I thought no because duplicate genomic regions are merged in an assembly.

Is this thought process correct or is it possible? If yes, are there any tools?

assembly • 7.9k views
ADD COMMENT
6
Entering edit mode
13.2 years ago
Christian ★ 3.1k

One way to test polyploidy is probably to realign raw reads against the assembly and counting read frequencies of variants.

In a diploid genome, we expect to find predominantly variants supported by 50% of the reads (two alleles, heterozygosity). In a triploid genome, we assume to find variants supported by 33% and 66% of reads (three alleles). Tetraploid: 25% and 75%. And so on.

I expect many caveats of this approach though. First, it assumes that all polyploid chromosomes are truly collapsed into a single chromosome in your assembly and not assembled as separate contigs. Second, it requires very high read coverage across the genome to call low frequency variants. Third, read coverage will fluctuate, making reliable estimates of read frequencies difficult. Fourth, variant mis-calls and duplicated regions will complicate the picture.

However, maybe it is possible to pool the genome-wide evidence of many thousand variants to come up with the most likely polyploidy status of your assembly.

ADD COMMENT
0
Entering edit mode

This is a clever idea and could probably be applied to some recently derived auto-polyploids.

ADD REPLY
0
Entering edit mode

That's brilliant idea! I have first seen it applied in Yoshida et al. 2013 (see Figure 9). We used it in analysis of Candida orthopsilosis hybrids (Figure S5). What's cool, we were able to detect copy number variations of individual chromosomes or even chromosomal arms (C. metapsilosis, submitted)!

ADD REPLY
0
Entering edit mode

Have any new methods came up?

ADD REPLY
4
Entering edit mode
13.2 years ago
Philippe ★ 1.9k

Hi,

if you still have the raw reads (before mapping) or the mapped reads you can try to identify duplicate genomic regions by detecting regions with significantly higher coverage. If your global coverage is of n and some region have a coverage of (theoretically) 2n this might indicate this region is duplicated. I unfortunately don't have some obvious reference to share (these are just memories from some presentation) but you can check what is done to detect CNVs for example (even though other methods are more widely used).

I hope it has been helpful.

Addition: from the comments below it seems the method is non-trivial and might not be the most suitable one.

ADD COMMENT
2
Entering edit mode

in practice, this approach is a lot harder to execute since polyploidy is a global phenomenon. So it is futile to identify local doubling of coverage - as would be expected only if you have segmental duplication or CNV.

ADD REPLY
1
Entering edit mode

As am minor note, coverage can vary a lot across the genome, due to GC content, mostly, so a 2-fold difference by itself might not mean much. You'd have to have another a control sample to compare too.

ADD REPLY
1
Entering edit mode

Based on my own experience, I agree with swbarnes2. Coverage varies a lot due to a lot of other factors, even you have high coverage across the entire genome. The 2-fold coverage approach is practically very hard to do quality control and calibration.

ADD REPLY
0
Entering edit mode

definetly one approach. thx.

ADD REPLY
0
Entering edit mode

I agree with you (and Casey Bergman post which raised the same concern). If we consider polyploidy as the presence of supernumerary chromosomes (the actual definition) this approach won't work. But the question mentions "duplicate genomic regions" which motivated me to help on how to identify such regions. I could have been more precise.

ADD REPLY
0
Entering edit mode

Thanks for your different inputs, as I mentioned I just hear about such methods but don't experience with them. Reading from more experimented persons it seems this is not trivial and some other methods or additional experimental work might be more suitable. I'll update my first post according to this.

ADD REPLY
3
Entering edit mode
13.2 years ago

Interesting question. In theory, it is not possible to detect a recent, complete auto-polyploid genome from a WGS assembly since the copy number of all chromosomes would scale perfectly with ploidy. That is, if all regions of the genome in a polyploid are the same (ie. no sequence variation among homologous chromosomes), you can't tell if the genome is 1C, 2C, 4C, etc.

However, for an allo-polyploid genome or for partial (auto- or allo-) polyploidy that is not complete across the genome, then it should be possible to detect the polyploidy from assembly of divergent haplotypes or regional differences in read depth as noted by Phillipe.

ADD COMMENT
2
Entering edit mode
13.2 years ago
Eitan Rubin ▴ 30

In plants, there are works suggesting that polyploidization is accompanied by rapid accumulation of mutations (look up Avi Levy's work). So it should be possible to find multiple alleles - heterozygocity for SNPs + indels

Look up Avi Levy's work from the Weizmann Institute of Science (he worked on wheat).

Eitan Rubin

ADD COMMENT
0
Entering edit mode

Hi Eitan, I was looking through Avraham Levy's papers, looking for "rapid accumulation of mutations" you mentioned, but without luck. Can you point me to a specific paper that elaborates on that concept? Thanks.

ADD REPLY
1
Entering edit mode
13.2 years ago
lexnederbragt ★ 1.3k

I don't agree with Casey Bergman that "it is not possible to detect a recent, complete auto-polyploid genome from a WGS assembly". Say, your (duplicated) genome is 1Gb in size, you sequence to 100x coverage, so 100Gb. After assembly, for a non-duplicated genome, you would expect the assembly size to be approximately 1 Gb, with an average coverage of the non-repetitive parts approx. 100x. If the genome instead was a complete auto-polyploid (all chromosomes duplicated), duplicate chromosomes collapsed during assembly, so you will see something like a 0.5 Gb total assembly size with an average coverage of the non-repetitive parts of 200x. This is of course an ideal situation, but you get the idea.

ADD COMMENT
0
Entering edit mode

This requires knowledge about the genome size, which was not stated in the question. Clearly with additional information (e.g. a reference genome, knowledge of genome size, cytological information), ploidy can be estimated. Though I would not find overall fold-changes in depth of coverage convincing evidence since the expected throughput of a WGS experiment is not the observed throughput and you could make false inferences with this approach.

ADD REPLY
0
Entering edit mode

You're right, I had forgotten that one needs and estimated genome size for this. I stand corrected...

ADD REPLY

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6