Question

Depth of coverage in Bam file does not match depth in VCF file

0

Entering edit mode

8.2 years ago

jsneaththompson ▴ 100

My lab has a variant calling pipeline where sorted bam files are used as input for Pindel. The pindel output files (D and SI) are then converted to vcfs for annotation.

However, when the sorted bam files are loaded into IGV the loaded coverage file gives one value for depth at a variant's location which is not the same as the value obtained from the vcf files (AD + RD).

I thought that maybe the depth shown in the vcf files might be an average of coverage over the length of the indel, but I can't see that that's the case. Is there an issue here, or are IGV coverage and VCF coverage not directly comparable? Would appreciate any advice; below is the code that generates the first vcf files from the sorted bam files:

for k in `cat samplesPindel.txt`; do
    ls ../sorted_bam/${k}.sorted.bam > ${k}_pindelinput.txt
sed -i 's|\(.\+\)\.sorted.bam|\1\.sorted.bam\t350\t\1|' ${k}_pindelinput.txt
done

#Run Pindel
for i in `cat samplesPindel.txt`; do
    pindel -f ${hg38} -i ${i}_pindelinput.txt -c ALL -o ${i}
#Create vcf files of deletion and insertion out of pindel
    pindel2vcf -p ${i}_D -r ${hg38} -R GRCh38 -d 201312 -G -v ${i}_D.vcf
    pindel2vcf -p ${i}_SI -r ${hg38} -R GRCh38 -d 201312 -G -v ${i}_SI.vcf
done

Pindel InDel IGV Coverage Variant Calling • 4.3k views

ADD COMMENT • link updated 8.2 years ago by Ming Tommy Tang ★ 4.7k • written 8.2 years ago by jsneaththompson ▴ 100

0

Entering edit mode

most caller will ignore the duplicate read, failing-qualoty-reads, discordant reads, low MAPQ, etc... so the number of reads could be lower in the VCF/DP field that in the BAM.

ADD REPLY • link 8.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

In addition to Pierre's answer, there are also various ways of calculating coverage. For example, deletion events could be classified as covered, or might not be. Secondary/supplementary alignments might be considered covered, or they might be ignored. Variant-callers might trim the ends of reads, or they might not. They can also perform realignment.

So, it's not surprising that the results differ; I wouldn't worry about it.

ADD REPLY • link 8.2 years ago by Brian Bushnell 20k

score 0 · Answer 1 · 2017-05-10

0

Entering edit mode

8.2 years ago

Ming Tommy Tang ★ 4.7k

as Pierre says, most callers do some filtering of the reads, so the reads number in the VCF file will be fewer than in the bam file.

ADD COMMENT • link 8.2 years ago by Ming Tommy Tang ★ 4.7k