GATK: How to filter variant sites of sample VCF that are not present in a set of control VCFs
2
0
Entering edit mode
10.3 years ago
ravast ▴ 20

I have to select variants sites from my sample which are not present in the control samples. I have tried this:

java -jar GenomeAnalysisTK.jar \
  -R /Users/bhaskaran/Desktop/1data/GRCm38_68.fa \
  -T CombineVariants \
  -V:CON1 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8404.vcf \
  -V:CON2 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8405.vcf \
  -V:CON3 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8406.vcf \
  -V:CON4 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8407.vcf \
  -V:CON5 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8408.vcf \
  -V:TED /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/parental_22.vcf \
  -o result101.vcf && \
java -jar GenomeAnalysisTK.jar \
  -R /Users/bhaskaran/Desktop/1data/GRCm38_68.fa \
  -T SelectVariants -V result101.vcf \
  -select "set == 'TED'" -o result102.vcf

Unfortunately, this command outputs absolutely no variant. No error message occurred. Is some one could help me?

sequencing SNP next-gen • 5.4k views
ADD COMMENT
1
Entering edit mode
10.3 years ago
Vivek ★ 2.7k

Does

awk '! /\#/' result101.vcf | grep -w "TED" | grep -vi "CON1\|CON2\|CON3\|CON4\|CON5\|Intersection" > TED_only.vcf

produce something?

ADD COMMENT
2
Entering edit mode

I tried with individual file also ..But it is still showing no variants present.....

ADD REPLY
2
Entering edit mode

Vivek as per your suggestion I tried this:

java -jar GenomeAnalysisTK.jar \
  -R /Users/bhaskaran/Desktop/1data/GRCm38_68.fa \
  -T CombineVariants \
  -V:CON1 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8404.vcf \
  -V:CON2 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8405.vcf \
  -V:CON3 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8406.vcf \
  -V:CON4 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8407.vcf \
  -V:CON5 /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/control/8408.vcf \
  -V:TED /Users/bhaskaran/Desktop/Tools/GenomeAnalysisTK-3.2-2/parental_22.vcf \
  -o result101.vcf -env

and then

awk '! /\#/' result101.vcf | grep -w "TED" | grep -vi "CON1\|CON2\|CON3\|CON4\|CON5\|Intersection" > TED_only.vcf

again I'm getting no result. But when I compared this file manually I could find many variants in my sample that are not found in controls

ADD REPLY
0
Entering edit mode

I'm not sure how you are doing a manual check. Do you want the variant to NOT be present in atleast one control or ANY control?

Can you post what you get from

awk '! /\#/' result101.vcf | cut -f8 | awk -F ';' '{print $NF}' | sort | uniq -c
ADD REPLY
0
Entering edit mode

I want variant not be present in a tleast one control. I got this result..

3937677 set=Intersection
ADD REPLY
1
Entering edit mode

That pretty much tells you that you have 3,937,677 variants in the VCF file and all of them are common to all samples.

ADD REPLY
0
Entering edit mode

OK. But I have a doubt. When I saw the result101.vcf for example at chrm 1 position -

3010834    .    A    G    63.13    .    AC=1;AF=0.500;AN=2;BaseQRankSum=-2.428;DP=182;Dels=0.00;FS=24.892;HaplotypeScore=1.6524;InbreedingCoeff=-0.2223;MQ=44.54;MQ0=1;MQRankSum=0.170;QD=5.26;ReadPosRankSum=0.852;set=Intersection    GT:AD:DP:GQ:PL    ./.    ./.    ./.    ./.    ./.    0/1:2,5:7:19:80,0,19.

Does that mean for my sample, the variant at position 3010834 is present whereas it is absent in the controls...sorry if I'm wrong...

ADD REPLY
1
Entering edit mode

That looks like that variant was not called in the first 5 samples but if that were the case, GATK shouldn't tell you that its in set=intersection. You could look at the individual control sample VCFs and check if that variant entry is present in them and if so what genotype it has.

ADD REPLY
0
Entering edit mode

You are correct. I'm trying the way as you said..

ADD REPLY
0
Entering edit mode

Its not geting

ADD REPLY
0
Entering edit mode
10.3 years ago
William ★ 5.3k

You can use bedools subtract.

If you have the control samples in multiple vcf files you can first combine those to one vcf file. And then use bedtools subtract.

ADD COMMENT
0
Entering edit mode

Williams i tried with bedttools..but its showing no new variant..

ADD REPLY

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6