kmer counting for heterozygosity estimation
1
1
Entering edit mode
4.9 years ago
el97004 ▴ 80

Hi all,

I want to count kmers in my sequencing reads inorder to be able to estimate heterozygosity of my genome using Genomescope. I have paired end reads (R1.fastq, R2.fastq), I ran Jellyfish to count kmers using the following settings:

jellyfish count -C -m 21 -s 5000000 -t 8 R1.fastq -o reads.jf

but open further thinking, I realized I should maybe incorporate R2.fastq as well. so I ran the command as such:

jellyfish count -C -m 21 -s 5000000 -t 8 R*.fastq -o reads.jf

this works, but the resulting heterozygosity values from Genomescope differ. I was wondering if anyone had some input on the right way to count kmers in paired end sequencing data for the purpose of downstream heterozygosity estimations.

Any ideas are greatly appreciated. Thank you!

jellyfish kmer heterozygosity genome assembly • 1.9k views
ADD COMMENT
0
Entering edit mode
4.9 years ago

First of all, if you investigate the quality of your forward and reverse reads, you will find the poor quality of reverse reads as compared to forward. So, when you calculate k-mers frequencies for both reads it will be different (and being just imaginary if you cluster forward and reverse reads on basis of k-mer frequencies, the mates will fall apart from each other due to poor quality in reverse reads). I suggest, if its supports your aim you can merge the pairs (assemble) first and then calculate k-mer frequencies. I had used compSeq from emboss toolkit for calculation of k-mers.

ADD COMMENT

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6