Question

Calculating effective coverage/callable genome for variant calling from a bam file

0

Entering edit mode

5.3 years ago

Rubal ▴ 350

We have mapped whole-genome sequence data in bam format. We are testing a variety of variant callers to identify mutant sites. The variant callers we are using have various different filters that results in sites being excluded from variant calling. For example some variant callers require >=3 reads to call a variant. Another excluded reads with >3 mismatches to the reference genome.

We would like to know what is the effective coverage that a variant caller sees for variant calling, given that many sites and reads are excluded by such filters. This would enable us to calculate our sensitivity to call variants across the genome and identify regions where no variants can be called.

We think the best way to do this would be a script or tool that reads a bam file and outputs a bigWig file with the effective coverage across the genome based on applying these filters. This could be adapted with new filters depending on the variant caller. Is there an existing tool that can take a bam file and output such information based on the two filters of minimum read depth >X and number of mismatches in a read < X?

Thanks in advance for any help and suggestions.

genome bam bigwig coverage variant-calling • 1.5k views

ADD COMMENT • link updated 5.3 years ago by Pierre Lindenbaum 164k • written 5.3 years ago by Rubal ▴ 350

score 1 · Answer 1 · 2019-07-31

Is there an existing tool that can take a bam file and output such information based on the two filters of minimum read depth >X and number of mismatches in a read < X?

create a first bed file with the coverage depth >X

samtools depth in.bam | awk -F '\t' '{if($3>10) printf("%s\t%d\t%s\n",$1,int($2)-1,$2);}' | bedtools merge > 1.bed

create a second bed file with NM < X

 java -jar dist/samjdk.jar --samoutputformat BAM  -e 'return !record.getReadUnmappedFlag() && (record.getIntegerAttribute("NM")==null || record.getIntegerAttribute("NM") < 10);' src/test/resources/S1.bam | bedtools  bamtobed > 2.bed

combine both bed files with 'bedtools intersect'