Question

Double Check VCF Calls in Corresponding .bam Files

0

Entering edit mode

5.5 years ago

kevin.stachelek ▴ 80

I have a set of variant calls from GATK Mutect2 in matched tumor/normal samples that are found in a subset of my sequencing cohort. I'd like to determine the allele frequency at each site in every sample, to determine whether these variants are being mistakenly filtered in some samples. I could just plug each bam file into IGV or USCSC but I'd rather have an approach that uses the source bam files and just calculates an allele frequency at a set of sites provided as a .bed file.

I have tried to use:

glactools
samtools mpileup to bcftools

but I jumped ship after struggling to either calculate allele frequency (bcftools) or operate on a restricted set of sites (glactools). I've settled on using bam-readcount and parsing the output files.

Can you suggest alternatives?

sequencing • 1.3k views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 5.5 years ago by kevin.stachelek ▴ 80

1

Entering edit mode

Hi Kevin, I am the author of glactools, let me know if you have any questions

ADD REPLY • link 5.3 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

Hello,

could you show some example lines of your VCF file? Usually this information can be find there, or at least can be calculated.

fin swimmer

ADD REPLY • link 5.5 years ago by finswimmer 16k

0

Entering edit mode

I believe that any caller will only make a call at a site which differs from the reference, otherwise the file size of vcfs would be huge. Is it true that most would have built in filters for mapping quality and base quality, i.e. --min-MQ or --min-BQ for mpileup > bcftools.

Just to be clear, i'm interested in getting an allele frequency at a site in a sample in which a variant was not called based on my observation of a variant at that same site in a different sample.

ADD REPLY • link 5.5 years ago by kevin.stachelek ▴ 80

0

Entering edit mode

I think that bcftools mpileup piped into bcftools call will do this, but only after you drastically reduce the QC thresholds. For example, look at the --pval-threshold parameter that can be passed to bcftools call.

With NGS data, though, a large proportion of bases in your covered regions will exhibit at least one erroneous base based on the extraordinarily high error rates associated with NGS sequencers.

ADD REPLY • link 5.4 years ago by Kevin Blighe 89k