Double Check VCF Calls in Corresponding .bam Files
0
0
Entering edit mode
5.2 years ago

I have a set of variant calls from GATK Mutect2 in matched tumor/normal samples that are found in a subset of my sequencing cohort. I'd like to determine the allele frequency at each site in every sample, to determine whether these variants are being mistakenly filtered in some samples. I could just plug each bam file into IGV or USCSC but I'd rather have an approach that uses the source bam files and just calculates an allele frequency at a set of sites provided as a .bed file.

I have tried to use:

  1. glactools
  2. samtools mpileup to bcftools

but I jumped ship after struggling to either calculate allele frequency (bcftools) or operate on a restricted set of sites (glactools). I've settled on using bam-readcount and parsing the output files.

Can you suggest alternatives?

sequencing • 1.2k views
ADD COMMENT
1
Entering edit mode

Hi Kevin, I am the author of glactools, let me know if you have any questions

ADD REPLY
0
Entering edit mode

Hello,

could you show some example lines of your VCF file? Usually this information can be find there, or at least can be calculated.

fin swimmer

ADD REPLY
0
Entering edit mode

I believe that any caller will only make a call at a site which differs from the reference, otherwise the file size of vcfs would be huge. Is it true that most would have built in filters for mapping quality and base quality, i.e. --min-MQ or --min-BQ for mpileup > bcftools.

Just to be clear, i'm interested in getting an allele frequency at a site in a sample in which a variant was not called based on my observation of a variant at that same site in a different sample.

ADD REPLY
0
Entering edit mode

I think that bcftools mpileup piped into bcftools call will do this, but only after you drastically reduce the QC thresholds. For example, look at the --pval-threshold parameter that can be passed to bcftools call.

With NGS data, though, a large proportion of bases in your covered regions will exhibit at least one erroneous base based on the extraordinarily high error rates associated with NGS sequencers.

ADD REPLY

Login before adding your answer.

Traffic: 1573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6