Question

Coverage Experiment To Indicate Polyploidy.

6

Entering edit mode

13.1 years ago

Fabian Bull ★ 1.3k

I am searching for hints of polyploidy in two species:

My hypothesis is that species A is diploid and B is tetraploid.

Because I have different sequencing depth in the 2 species I extracted reads in the ration 1:20 ( ratio of the genome sizes) to get similar expected coverages.

These two sets of reads are de-novo assembled independently and the coverage of the created contigs is analyzed. A histogram of the coverages is given below: enter image description here

You can see that the peak species B is approx. at the double of peak A. The different tail shape can be explained by a different structure in repetitive elements.

Questions:

Do you see any flaws in this experiment?
Could there be other reasons for the shifted peak?
Can you imagine other experiments which could indicate polyploidy?

Edit: If you agree with my throught-process you should also add a comment/answer.

assembly • 4.4k views

ADD COMMENT • link updated 13.1 years ago by Chris Miller 22k • written 13.1 years ago by Fabian Bull ★ 1.3k

0

Entering edit mode

I think I understand what you are after - basically more DNA coming from one experiment indicates that it has more copies - but how did this really work - what does it mean to have extracted reads in the ration of 1:20 were the samples sequenced separately or together

ADD REPLY • link 13.1 years ago by Istvan Albert 103k

0

Entering edit mode

They were sequenced and assembled separately. My idea was that maybe the duplicated regions collapse in the assembly and therefore are mapped twice as often as the other regions.

ADD REPLY • link 13.1 years ago by Fabian Bull ★ 1.3k

score 3 · Answer 1 · 2012-07-09

3

Entering edit mode

13.1 years ago

Chris Miller 22k

What about plotting the allele frequencies for all heterozygous SNP sites? Assuming that there's a little bit of CN in these genomes, you'd expect to see the following:

diploid organism: big peak at 50% (neutral, CN2) , smaller peak at 33% and 66% (+1 copy CN3)
tetraploid organism: peaks at 25%, 50%, 75% (neutral, CN4). Possibly smaller peaks at 33/66% (-1 copy, CN3) and smaller peaks at 20,40,60,80 (+1 copy, CN5)

This approach depends on having deep enough coverage to resolve those peaks. If not, you'll just get a big smear.

ADD COMMENT • link 13.1 years ago by Chris Miller 22k

0

Entering edit mode

Could you explain what you mean by CN? I assume copy number? CN2 are snps which occur in 2 variations? Sorry but I am not really into the snp terminology.

ADD REPLY • link 13.1 years ago by Fabian Bull ★ 1.3k

2

Entering edit mode

yes, CN2 = diploid (two copies of that gene). If it's a heterozygous site, say C/T, and you count the number of reads containing each of those alleles, then roughly 50% should be C and 50% should be T. If it's a region that has three copies, again with a het snp C/T, then your options are 33% C and 66% T, or 66% C and 33% T. There wil be some variation around this number due to sampling error, but if you look at lots of sites, you'll see peaks emerge.

ADD REPLY • link 13.1 years ago by Chris Miller 22k