How To Check For Ploidy Using Ngs Only?
1
3
Entering edit mode
10.8 years ago
Adrian Pelin ★ 2.6k

Hello,

I am working on a single celled organism, that I am isolating from a natural environment and sequencing it using illumina PE.

How can I determine it's ploidy? There is no reference genome.

I was thinking of mapping the reads to a denovo assembly, and seeing what is the maximum number of alleles I can find per locus, and what are their frequencies.

Adrian

ngs mapping • 4.9k views
ADD COMMENT
0
Entering edit mode

Sound reasonable. Maybe you can do this even faster with a kmer based approach, provided you find a way to differentiate between alleles and sequencing errors.

ADD REPLY
1
Entering edit mode

This is an excellent idea, I have already tried it:) I build a kmer graph, and I see 3 peaks. The last peak is the fattest (if you know what I mean), the second peak is half the coverage of the last, and the first peak is half the coverage of the second. This suggests that Allele frequencies in the data set are either 0.25, 0.50 and 1.00.

This potentially suggests the organism is tetraploid. However, I am missing a peak for 0.75, but I believe since the 1.00 peak is so thick, it is hiding the 0.75 peak.

Do these conclusions sound correct?

As for seq. error... this is illumina, and the run was of high quality, so I suppose the errors would simple contribute to the bell curve in my peaks. Any other suggestions? I can filter reads based on quality I suppose, but I heard people warning against this, since it introduces bias.

ADD REPLY
0
Entering edit mode

Your conclusion sounds reasonable to me, although I am not an expert on interpreting these peaks. With respect to the sequencing errors I think you are also right. To clean up your data you could also just throw away all kmers that occur only a few times.

ADD REPLY
0
Entering edit mode

Hi Adrian, I am also using kmer strategy in order to determine ploidy. Can you tell me more about the tool you used and what did you do with the output, please?

ADD REPLY
0
Entering edit mode

Couldn't one of the additional peaks be an organellar genome?

ADD REPLY
0
Entering edit mode

Good point, but I work on an organism that does not have an organellar genome. The peaks could also originate from contaminants, bacteria.

That is why I have mapped my reads to my draft assembly (bwa), of contigs I am certain come from the correct nuclear genome, and used those reads that mapped to construct the kmer graph.

ADD REPLY
2
Entering edit mode
6.2 years ago
kamiljaron ▴ 230

You can get an idea about ploidy using raw sequencing reads by looking at the kmer spectra. We designed a method called smudgeplot. It extracts kmer pairs with near-identical sequences representing usually heterozygous kmers and plotting them in a way that is indicative of ploidy. You can find details about this in our recent preprint or the linked GitHub repository.

--- edit ---

GenomeScope 2.0 and Smudgeplot - tools for the characterisation of genomes of any ploidy directly from sequencing reads, got published in a peer-reviewed journal.

ADD COMMENT

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6