Efficent Estimation of Sequencing Coverage
1
0
Entering edit mode
5.9 years ago
dario.garvan ▴ 520

I am using MELT, a tool which annoyingly requires the user to specify the coverage of each BAM file as an input parameter.

Please use a tool to determine approximate (+/- 2x) coverage prior to analysis.

The data I have is whole genome sequencing data of 55 humans, which, based on file size of the FASTQ files, I'd guess varies between about 40 times and 100 times coverage. I hesitate to use samtools to calculate this. What's an approach that provides a good balance between runtime and accuracy? The vagueness of the instruction suggests that there might not be such a solution readily available.

samtools coverage • 1.5k views
ADD COMMENT
2
Entering edit mode
5.9 years ago

Read length x # of reads x % aligned / size of genome = coverage

Coverage scales ~linearly with FASTQ file size (but not gzipped FASTQs), so you can determine the coverage for a few representative files (small/medium/large) and calculate the others based on file size.

ADD COMMENT
0
Entering edit mode

The percentage of reads aligned isn't output by bwa mem at the end of alignment process, unlike Bowtie 2, unfortunately.

ADD REPLY
2
Entering edit mode

Samtools 'flagstat' outputs the number of mapped reads in your BAM file (plus other metrics).

ADD REPLY
1
Entering edit mode

You can use samtools idxstats to get number of reads that are aligned to any given chromosome, which would be essentially be what you are looking for (don't need any percentage as this gives you the number of reads actually aligned). Sum over all chromosomes (exclude the * line which are unmapped reads) and you're good to go

ADD REPLY

Login before adding your answer.

Traffic: 4205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6