I am using MELT, a tool which annoyingly requires the user to specify the coverage of each BAM file as an input parameter.
Please use a tool to determine approximate (+/- 2x) coverage prior to analysis.
The data I have is whole genome sequencing data of 55 humans, which, based on file size of the FASTQ files, I'd guess varies between about 40 times and 100 times coverage. I hesitate to use samtools
to calculate this. What's an approach that provides a good balance between runtime and accuracy? The vagueness of the instruction suggests that there might not be such a solution readily available.
The percentage of reads aligned isn't output by
bwa mem
at the end of alignment process, unlike Bowtie 2, unfortunately.Samtools 'flagstat' outputs the number of mapped reads in your BAM file (plus other metrics).
You can use samtools idxstats to get number of reads that are aligned to any given chromosome, which would be essentially be what you are looking for (don't need any percentage as this gives you the number of reads actually aligned). Sum over all chromosomes (exclude the * line which are unmapped reads) and you're good to go