Question

How To Check For Ploidy Using Ngs Only?

3

Entering edit mode

11.4 years ago

Adrian Pelin ★ 2.7k

Hello,

I am working on a single celled organism, that I am isolating from a natural environment and sequencing it using illumina PE.

How can I determine it's ploidy? There is no reference genome.

I was thinking of mapping the reads to a denovo assembly, and seeing what is the maximum number of alleles I can find per locus, and what are their frequencies.

Adrian

ngs mapping • 5.3k views

ADD COMMENT • link updated 6.8 years ago by kamiljaron ▴ 230 • written 11.4 years ago by Adrian Pelin ★ 2.7k

0

Entering edit mode

Sound reasonable. Maybe you can do this even faster with a kmer based approach, provided you find a way to differentiate between alleles and sequencing errors.

ADD REPLY • link 11.4 years ago by Christian ★ 3.1k

1

Entering edit mode

This is an excellent idea, I have already tried it:) I build a kmer graph, and I see 3 peaks. The last peak is the fattest (if you know what I mean), the second peak is half the coverage of the last, and the first peak is half the coverage of the second. This suggests that Allele frequencies in the data set are either 0.25, 0.50 and 1.00.

This potentially suggests the organism is tetraploid. However, I am missing a peak for 0.75, but I believe since the 1.00 peak is so thick, it is hiding the 0.75 peak.

Do these conclusions sound correct?

As for seq. error... this is illumina, and the run was of high quality, so I suppose the errors would simple contribute to the bell curve in my peaks. Any other suggestions? I can filter reads based on quality I suppose, but I heard people warning against this, since it introduces bias.

ADD REPLY • link 11.4 years ago by Adrian Pelin ★ 2.7k

0

Entering edit mode

Your conclusion sounds reasonable to me, although I am not an expert on interpreting these peaks. With respect to the sequencing errors I think you are also right. To clean up your data you could also just throw away all kmers that occur only a few times.

ADD REPLY • link 11.4 years ago by Christian ★ 3.1k

0

Entering edit mode

Hi Adrian, I am also using kmer strategy in order to determine ploidy. Can you tell me more about the tool you used and what did you do with the output, please?

ADD REPLY • link 8.4 years ago by dilution • 0

0

Entering edit mode

Couldn't one of the additional peaks be an organellar genome?

ADD REPLY • link 11.4 years ago by bewickaj ▴ 10

0

Entering edit mode

Good point, but I work on an organism that does not have an organellar genome. The peaks could also originate from contaminants, bacteria.

That is why I have mapped my reads to my draft assembly (bwa), of contigs I am certain come from the correct nuclear genome, and used those reads that mapped to construct the kmer graph.

ADD REPLY • link 11.4 years ago by Adrian Pelin ★ 2.7k

score 2 · Answer 1 · 2018-09-26

You can get an idea about ploidy using raw sequencing reads by looking at the kmer spectra. We designed a method called smudgeplot. It extracts kmer pairs with near-identical sequences representing usually heterozygous kmers and plotting them in a way that is indicative of ploidy. You can find details about this in our recent preprint or the linked GitHub repository.

--- edit ---

GenomeScope 2.0 and Smudgeplot - tools for the characterisation of genomes of any ploidy directly from sequencing reads, got published in a peer-reviewed journal.