Question

Efficent Estimation of Sequencing Coverage

0

Entering edit mode

6.3 years ago

dario.garvan ▴ 520

I am using MELT, a tool which annoyingly requires the user to specify the coverage of each BAM file as an input parameter.

Please use a tool to determine approximate (+/- 2x) coverage prior to analysis.

The data I have is whole genome sequencing data of 55 humans, which, based on file size of the FASTQ files, I'd guess varies between about 40 times and 100 times coverage. I hesitate to use samtools to calculate this. What's an approach that provides a good balance between runtime and accuracy? The vagueness of the instruction suggests that there might not be such a solution readily available.

samtools coverage • 1.6k views

ADD COMMENT • link updated 6.3 years ago by harold.smith.tarheel ★ 5.0k • written 6.3 years ago by dario.garvan ▴ 520

score 2 · Accepted Answer · 2019-02-06

2

Entering edit mode

6.3 years ago

harold.smith.tarheel ★ 5.0k

Read length x # of reads x % aligned / size of genome = coverage

Coverage scales ~linearly with FASTQ file size (but not gzipped FASTQs), so you can determine the coverage for a few representative files (small/medium/large) and calculate the others based on file size.

ADD COMMENT • link 6.3 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

The percentage of reads aligned isn't output by bwa mem at the end of alignment process, unlike Bowtie 2, unfortunately.

ADD REPLY • link 6.3 years ago by dario.garvan ▴ 520

2

Entering edit mode

Samtools 'flagstat' outputs the number of mapped reads in your BAM file (plus other metrics).

ADD REPLY • link 6.3 years ago by harold.smith.tarheel ★ 5.0k

1

Entering edit mode

You can use samtools idxstats to get number of reads that are aligned to any given chromosome, which would be essentially be what you are looking for (don't need any percentage as this gives you the number of reads actually aligned). Sum over all chromosomes (exclude the * line which are unmapped reads) and you're good to go

ADD REPLY • link 6.3 years ago by cmdcolin ★ 4.2k