Coverage for sample with many species
1
1
Entering edit mode
10.4 years ago
biobio ▴ 50

Hi,

I have sequence from grapevines and I'm trying to see what viruses there are. I was able to assemble most of a viral genome (15000 out of 18000 bp in one contig). I'm trying to estimate the coverage. Here's a breakdown of what I did:

raw reads -> trim adapters -> map to grape reference and remove mapped reads -> assemble trimmed, unmapped reads.

To estimate the coverage, I used the virus that had the largest contig (grapevine leafroll associated virus 3) and I mapped the trimmed unmapped reads to it. I started with about 7 million reads and 1.3 million of them mapped to the genome. The average read length was 50 bp and the total genome size was 18Kbp. Using the equation presented in this: http://res.illumina.com/documents/products/technotes/technote_coverage_calculation.pdf I get

Coverage = Length of read * number of reads / haploid genome length

Coverage = 50 * 1.3x10^6 / 1.8x10^4

Coverage = 3611x? Could that be right?

coverage Assembly • 2.3k views
ADD COMMENT
3
Entering edit mode
10.4 years ago
Michele Busby ★ 2.2k

For viruses, we often get that kind of coverage.

To sanity check it, I would open up the alignment in a viewer like IGV and see if it looks about right.

RNA viruses have really uneven coverage so you should expect to have regions of really high and (alas) no coverage. I don't know about DNA viruses. It may be the RNA secondary structure that makes it uneven.

ADD COMMENT

Login before adding your answer.

Traffic: 1615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6