No filtering occuring with GATK'S VariantFiltration Walker
1
0
Entering edit mode
9.6 years ago

(This has been x-posted to the GATK forum.)

I have a file called rawSnps.vcf, which I've been trying to filter through the VariantFiltration Walker. The command I've been using is as follows:

/usr/java/latest/bin/java \
  -Xmx2g \
  -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar \
  -T VariantFiltration \
  -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa \
  -V Papi_C02_1/rawSnps.vcf \
  --filterExpression "QD < 2 || FS > 60 || MQ < 40 || MappingQualityRankSum < -12.5 || ReadPosRankSum < -8" \
  --filterName "snpfilter" \
  -o Papi_C02_1/filtered_snps.vcf -allowPotentiallyMisencodedQuals

I noticed that this wasn't working, so I switched to a more simple command, like this:

/usr/java/latest/bin/java \
  -Xmx2g \
  -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar \
  -T VariantFiltration \
  -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa \
  -V rawSnps.vcf \
  --filterExpression "MQ < 30" \
  --filterName "snpfilter" \
  -o filtered_snps_test.vcf \
  -allowPotentiallyMisencodedQuals

However, that isn't working either.

The output I get from GATK looks like this:

/usr/java/latest/bin/java -Xmx2g -jar ~/bin/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression "MQ < 30" --filterName "snpfilter" -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals
INFO  10:03:14,434 HelpFormatter - --------------------------------------------------------------------------------
INFO  10:03:14,436 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.3-0-g37228af, Compiled 2014/10/24 01:07:22
INFO  10:03:14,437 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  10:03:14,437 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  10:03:14,442 HelpFormatter - Program Args: -T VariantFiltration -R /home/rothlab/agopal/bin/bowtie2-2.2.3/indexes/s_cerevisiae.fa -V rawSnps.vcf --filterExpression MQ < 30 --filterName snpfilter -o filtered_snps_test.vcf -allowPotentiallyMisencodedQuals
INFO  10:03:14,445 HelpFormatter - Executing as agopal@guru-int.mshri.on.ca on Linux 2.6.32-431.11.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13.
INFO  10:03:14,445 HelpFormatter - Date/Time: 2015/05/04 10:03:14
INFO  10:03:14,445 HelpFormatter - --------------------------------------------------------------------------------
INFO  10:03:14,445 HelpFormatter - --------------------------------------------------------------------------------
INFO  10:03:15,216 GenomeAnalysisEngine - Strictness is SILENT
INFO  10:03:15,308 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  10:03:15,439 GenomeAnalysisEngine - Preparing for traversal
INFO  10:03:15,440 GenomeAnalysisEngine - Done preparing for traversal
INFO  10:03:15,440 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  10:03:15,441 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  10:03:15,441 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  10:03:17,682 ProgressMeter -            done      7942.0     2.0 s       4.7 m       99.9%     2.0 s       0.0 s
INFO  10:03:17,682 ProgressMeter - Total runtime 2.24 secs, 0.04 min, 0.00 hours
INFO  10:03:18,554 GATKRunReport - Uploaded run statistics report to AWS S3

i.e., you there is no error report. However, when I manually look at filtered_snps_test.vcf, I can see that many SNPs that have a MQ value below 30 (or 40, if that's what I've used) have still been passed in the filter.

I've also done a raw line-count of rawSnps.vcf and filtered_snps_test.vcf, and starting at the #CHROM position, both line counts are the same.

I'm not sure what's going on here. You can see both files here:

rawSnps.vcf: https://www.dropbox.com/s/868kkpna5n315r0/filtered_snps_test.vcf?dl=0

filtered_snps_test.vcf: https://www.dropbox.com/s/868kkpna5n315r0/filtered_snps_test.vcf?dl=0

Any help would be much appreciated.

Thanks!

GATK • 3.4k views
ADD COMMENT
0
Entering edit mode

MappingQualityRankSum is missing from your VCF header.

$ curl -sL "https://www.dropbox.com/s/868kkpna5n315r0/filtered_snps_test.vcf?dl=0" | grep MappingQualityRankSum

try removing this field from your test?

ADD REPLY
2
Entering edit mode
9.6 years ago
Cherry ▴ 50

Have you tried to use floating number when writing the command? (--filterExpression "MQ < 30.0")

ADD COMMENT
0
Entering edit mode

Yeah, this was the problem. I just figured it out too.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6