Remove few samples and their related information from dbSNP
1
0
Entering edit mode
8.3 years ago
zengtony743 ▴ 80

Hi, I have questions here, I got a dbSNP (VCF format) for filtering my own variants, However, i do not need SNPs from some samples in dbSNP. I want to remove some samples from the dbSNP. Any tools can help me do this? I tried GATK SelectVariants, it does not work.

1) header of the dbSNP vcf file

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129P2

129S1 129S5 AJ AKRJ BALBcJ C3HHeJ C57BL6NJ CASTEiJ CBAJ
DBA2J FVBNJ LPJ NODShiLtJ NZOHlLtJ PWKPhJ SPRETEiJ
WSBEiJ
chr10 3100945 . C G 252.17 PASS AC1=0;AC=2;AF1=0;AN=36;D
P4=127,322,1,9;DP=474;MDV=0;MQ=35;MSD=0;PV0=0.37;PV1=1;PV2=0.25;PV3=0.068;PV4=0.
37,1,0.25,0.068;QD=0.0133;SB=0.3611;VDB=0.0253 GT:GQ:DP:SP:PL:FI 0/0:.:16
:0:0,.,.:1 0/0:.:36:0:0,.,.:1 0/0:.:8:0:0,.,.:1 0/0:.:17:0:0,.,.
:1 0/0:.:26:0:0,.,.:1 0/0:.:27:0:0,.,.:1 0/0:.:41:0:0,.,.:1
0/0:.:24:0:0,.,.:1 0/0:.:29:0:0,.,.:1 0/0:.:26:0:0,.,.:1 0/0:.:32
:0:0,.,.:1 0/0:.:33:0:0,.,.:1 0/0:.:31:0:0,.,.:1 0/0:.:25:0:0,.,.

2) I need to keep all the samples but not 129S1, 129S5 and C57BL6NJ

Or I want my final dbSNP file to be like this

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 

 AJ  AKRJ  BALBcJ  C3HHeJ  CASTEiJ  CBAJ  DBA2J  FVBNJ  LPJ  NODShiLtJ  NZOHlLtJ PWKPhJ  SPRETEiJ WSBEiJ
chr10 3100945 . C G 252.17 PASS AC1=0;AC=2;AF1=0;AN=36;D
P4=127,322,1,9;DP=474;MDV=0;MQ=35;MSD=0;PV0=0.37;PV1=1;PV2=0.25;PV3=0.068;PV4=0.
37,1,0.25,0.068;QD=0.0133;SB=0.3611;VDB=0.0253 GT:GQ:DP:SP:PL:FI 0/0:.:16
:0:0,.,.:1 0/0:.:36:0:0,.,.:1 0/0:.:8:0:0,.,.:1 0/0:.:17:0:0,.,.
:1 0/0:.:26:0:0,.,.:1 0/0:.:27:0:0,.,.:1 0/0:.:41:0:0,.,.:1
0/0:.:24:0:0,.,.:1 0/0:.:29:0:0,.,.:1 0/0:.:26:0:0,.,.:1 0/0:.:32
:0:0,.,.:1 0/0:.:33:0:0,.,.:1 0/0:.:31:0:0,.,.:1 0/0:.:25:0:0,.,.

3) i split samples from dbSNP vcf file except for 129S1, 129S5, C57BL6NJ using GATK SelectVariants. It does not work for dbSNP vcf file (SelectVariants tool works for my own VCF file though) the command i used is

$ java -jar GenomeAnalysisTK.jar -R genome.fa -T SelectVariants --variant dbSNP.vcf -o final_dbSNP.vcf -sn AJ -sn AKRJ -sn BALBcJ -sn C3HHeJ -sn CASTEiJ -sn CBAJ -sn DBA2J -sn FVBNJ -sn LPJ -sn NODShiLtJ -sn NZOHlLtJ -sn PWKPhJ -sn SPRETEiJ -sn WSBEiJ &

snp vcf split • 2.3k views
ADD COMMENT
0
Entering edit mode
8.3 years ago
Zaag ▴ 870

This should remove the samples you want removed:

java -jar GenomeAnalysisTK.jar -R genome.fa -T SelectVariants --variant dbSNP.vcf -o final_dbSNP.vcf -xl_sn 129S1 -xl_sn 129S5  -xl_sn C57BL6NJ
ADD COMMENT
0
Entering edit mode

Thank you very much, Zaag. That works!

ADD REPLY

Login before adding your answer.

Traffic: 1636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6