I have a set of variant calls from GATK Mutect2 in matched tumor/normal samples that are found in a subset of my sequencing cohort. I'd like to determine the allele frequency at each site in every sample, to determine whether these variants are being mistakenly filtered in some samples. I could just plug each bam file into IGV or USCSC but I'd rather have an approach that uses the source bam files and just calculates an allele frequency at a set of sites provided as a .bed file.
I have tried to use:
- glactools
- samtools mpileup to bcftools
but I jumped ship after struggling to either calculate allele frequency (bcftools) or operate on a restricted set of sites (glactools). I've settled on using bam-readcount and parsing the output files.
Can you suggest alternatives?
Hi Kevin, I am the author of glactools, let me know if you have any questions
Hello,
could you show some example lines of your VCF file? Usually this information can be find there, or at least can be calculated.
fin swimmer
I believe that any caller will only make a call at a site which differs from the reference, otherwise the file size of vcfs would be huge. Is it true that most would have built in filters for mapping quality and base quality, i.e. --min-MQ or --min-BQ for mpileup > bcftools.
Just to be clear, i'm interested in getting an allele frequency at a site in a sample in which a variant was not called based on my observation of a variant at that same site in a different sample.
I think that
bcftools mpileup
piped intobcftools call
will do this, but only after you drastically reduce the QC thresholds. For example, look at the--pval-threshold
parameter that can be passed tobcftools call
.With NGS data, though, a large proportion of bases in your covered regions will exhibit at least one erroneous base based on the extraordinarily high error rates associated with NGS sequencers.