REF, ALT not recoded after removing individual sample in VCFtools and VCFlib
1
0
Entering edit mode
9.5 years ago

Hi,

I have a VCF file and realized that one individual is an outlier and is often different than the others. I'd like to remove this individual and have the ALT and REF columns follow suit. For example, if only this removed individual had the ALT allele the ALT column would then have a "." in it instead of the ALT allele that is no longer present in the remaining samples. And same with indels.

I have tried this with VCFtools and VCFlib, but both do not recode the ALT and REF columns. Does anyone know of a tool that can do this without having to remake the entire VCF file?

Thank you,
Suzanne

next-gen genome alignment • 3.4k views
ADD COMMENT
0
Entering edit mode

Thank you for your help. I am using your software and it is getting me most of the way there. Thanks!

Would you please look at these lines:

I think the ref allele and INFO/FORMAT in 173 and 175 case no longer need the indication that it was deleted because those lines no longer exist in the dataset. Is this intentional?

KB871578.1    173    .    GC    .    44.11    PASS    AC=1;AF=0.014;AN=74;BaseQRankSum=-7.360e-01;ClippingRankSum=-7.360e-01;DP=179;FS=0.000;GQ_MEAN=11.57;GQ_STDDEV=12.08;InbreedingCoeff=-0.0871;MLEAC=1;MLEAF=0.014;MQ=60.00;MQ0=0;MQRankSum=-7.360e-01;NCC=9;QD=14.70;ReadPosRankSum=0.736;SOR=1.179    GT:AD:DP:GQ:PL    0/0:2,0:2:3:0,3,45    0/0:4,0:4:12:0,12,165    0/0:2,0:2:3:0,3,45    0/0:3,0:3:6:0,6,90    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,173    0/0:8,0:8:15:0,15,225    0/0:2,0:2:3:0,3,45    0/0:1,0:1:3:0,3,45    ./.    ./.    ./.    ./.    ./.    ./.    ./.    ./.    ./.    0/0:6,0:6:12:0,12,180    0/0:9,0:9:18:0,18,270    0/0:5,0:5:12:0,12,180    0/0:5,0:5:3:0,3,45    0/0:1,0:1:3:0,3,36    0/0:4,0:4:12:0,12,176    0/0:2,0:2:6:0,6,82    0/0:6,0:6:15:0,15,225    0/0:2,0:2:6:0,6,85    0/0:10,0:10:18:0,18,270    0/0:4,0:4:12:0,12,173    0/0:5,0:5:15:0,15,215    0/0:9,0:9:21:0,21,315    0/0:4,0:4:12:0,12,167    0/0:4,0:4:9:0,9,135    0/0:3,0:3:6:0,6,90    0/0:5,0:5:15:0,15,221    0/0:2,0:2:6:0,6,90    0/0:3,0:3:9:0,9,128    0/0:5,0:5:15:0,15,216    0/0:3,0:3:9:0,9,132    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,169    0/0:3,0:3:9:0,9,122    0/0:5,0:5:12:0,12,180    0/0:3,0:3:6:0,6,90
KB871578.1    174    .    C    .    .    PASS    AN=90;DP=173;NCC=1    GT:AD:DP    0/0:3:3    0/0:4:4    0/0:2:2    0/0:4:4    0/0:4:4    0/0:4:4    0/0:8:8    0/0:2:2    0/0:1:1    ./.    0/0:0:1    0/0:2:5    0/0:2:5    0/0:1:1    0/0:1:2    0/0:0:2    0/0:0:2    0/0:0:2    0/0:6:6    0/0:9:9    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:10:10    0/0:4:4    0/0:5:5    0/0:9:9    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
KB871578.1    175    .    TGCTGC    .    43.94    PASS    AC=1;AF=0.013;AN=76;BaseQRankSum=-7.360e-01;ClippingRankSum=0.736;DP=181;FS=0.000;GQ_MEAN=11.50;GQ_STDDEV=11.94;InbreedingCoeff=0.0019;MLEAC=1;MLEAF=0.013;MQ=60.00;MQ0=0;MQRankSum=0.736;NCC=8;QD=8.79;ReadPosRankSum=0.736;SOR=1.179    GT:AD:DP:GQ:PL    0/0:3,0:3:6:0,6,90    0/0:4,0:4:12:0,12,161    0/0:2,0:2:3:0,3,45    0/0:4,0:4:9:0,9,135    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,168    0/0:8,0:8:15:0,15,225    0/0:2,0:2:3:0,3,45    0/0:1,0:1:3:0,3,45    ./.    ./.    ./.    ./.    ./.    0/0:1,1:2:0:0,0,1    ./.    ./.    ./.    0/0:6,0:6:12:0,12,180    0/0:9,0:9:18:0,18,270    0/0:5,0:5:12:0,12,180    0/0:6,0:6:6:0,6,90    0/0:1,0:1:3:0,3,40    0/0:4,0:4:12:0,12,172    0/0:2,0:2:6:0,6,82    0/0:6,0:6:15:0,15,225    0/0:2,0:2:6:0,6,85    0/0:10,0:10:18:0,18,270    0/0:4,0:4:12:0,12,171    0/0:5,0:5:15:0,15,216    0/0:9,0:9:21:0,21,315    0/0:4,0:4:12:0,12,177    0/0:4,0:4:9:0,9,135    0/0:3,0:3:6:0,6,90    0/0:5,0:5:15:0,15,211    0/0:2,0:2:6:0,6,90    0/0:3,0:3:9:0,9,134    0/0:5,0:5:15:0,15,214    0/0:3,0:3:9:0,9,130    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,173    0/0:3,0:3:9:0,9,130    0/0:5,0:5:12:0,12,180    0/0:3,0:3:6:0,6,90
KB871578.1    176    .    G    .    .    PASS    AN=90;DP=170;NCC=1    GT:AD:DP    0/0:3:3    0/0:4:4    0/0:2:2    0/0:4:4    0/0:4:4    0/0:4:4    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:0:1    0/0:2:5    0/0:2:5    0/0:1:1    0/0:0:1    0/0:0:2    0/0:0:2    0/0:0:2    0/0:6:6    0/0:8:8    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:9:9    0/0:4:4    0/0:5:5    0/0:9:9    0/0:4:4    0/0:4:4    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
KB871578.1    177    .    C    .    .    PASS    AN=88;DP=169;NCC=2    GT:AD:DP    0/0:3:3    0/0:4:4    0/0:2:2    0/0:4:4    0/0:4:4    0/0:4:4    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:0:1    0/0:0:3    0/0:0:4    ./.    0/0:0:1    0/0:0:2    0/0:0:2    0/0:0:2    0/0:6:6    0/0:8:8    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:9:9    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
KB871578.1    178    .    T    .    .    PASS    AN=88;DP=169;NCC=2    GT:AD:DP    0/0:3:3    0/0:5:5    0/0:2:2    0/0:4:4    0/0:4:4    0/0:5:5    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:1:1    0/0:3:3    0/0:3:4    ./.    0/0:1:1    0/0:2:2    0/0:2:2    0/0:2:2    0/0:5:5    0/0:8:8    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:8:8    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
KB871578.1    179    .    G    .    .    PASS    AN=88;DP=168;NCC=2    GT:AD:DP    0/0:2:2    0/0:5:5    0/0:2:2    0/0:4:4    0/0:4:4    0/0:5:5    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:1:1    0/0:3:3    0/0:4:4    ./.    0/0:1:1    0/0:2:2    0/0:2:2    0/0:2:2    0/0:5:5    0/0:7:7    0/0:6:6    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:8:8    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
KB871578.1    180    .    C    .    .    PASS    AN=88;DP=170;NCC=2    GT:AD:DP    0/0:2:2    0/0:5:5    0/0:2:2    0/0:4:4    0/0:4:4    0/0:5:5    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:1:1    0/0:3:3    0/0:3:4    ./.    0/0:1:1    0/0:2:2    0/0:3:3    0/0:4:4    0/0:5:5    0/0:7:7    0/0:6:6    0/0:5:5    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:8:8    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3

Thanks very much in advance!

ADD REPLY
0
Entering edit mode

yes, you can remove those lines with option '-r' ( remove variant if there is not any called genotype on the line. )

ADD REPLY
0
Entering edit mode

Thank you for your quick response, here are my commands:

java \
  -jar /panfs/roc/groups/14/mcgaughs/mcgaughs/tools/jvarkit/dist-1.128/vcfcutsamples.jar \
  -f /panfs/roc/scratch/mcgaugh/VCFs/Not_AA.txt \
  -r /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf > /panfs/roc/scratch/mcgaugh/VCFs/TEST_PL3.vcf

I also tried:

java \
  -jar /panfs/roc/groups/14/mcgaughs/mcgaughs/tools/jvarkit/dist-1.128/vcfcutsamples.jar \
  -r \
  -f /panfs/roc/scratch/mcgaugh/VCFs/Not_AA.txt /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf > /panfs/roc/scratch/mcgaugh/VCFs/TEST_PL4.vcf

Both give:

173    .    GC    .    44.11    PASS    AC=1;AF=0.014;AN=74;BaseQRankSum=-7.360e-01;ClippingRankSum=-7.360e-01;DP=179;FS=0.000;GQ_MEAN=11.57;GQ_STDDEV=12.08;InbreedingCoeff=-0.0871;MLEAC=1;MLEAF=0.014;MQ=60.00;MQ0=0;MQRankSum=-7.360e-01;NCC=9;QD=14.70;ReadPosRankSum=0.736;SOR=1.179    GT:AD:DP:GQ:PL    0/0:2,0:2:3:0,3,45    0/0:4,0:4:12:0,12,165    0/0:2,0:2:3:0,3,45    0/0:3,0:3:6:0,6,90    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,173    0/0:8,0:8:15:0,15,225    0/0:2,0:2:3:0,3,45    0/0:1,0:1:3:0,3,45    ./.    ./.    ./.    ./.    ./.    ./.    ./.    ./.    ./.    0/0:6,0:6:12:0,12,180    0/0:9,0:9:18:0,18,270    0/0:5,0:5:12:0,12,180    0/0:5,0:5:3:0,3,45    0/0:1,0:1:3:0,3,36    0/0:4,0:4:12:0,12,176    0/0:2,0:2:6:0,6,82    0/0:6,0:6:15:0,15,225    0/0:2,0:2:6:0,6,85    0/0:10,0:10:18:0,18,270    0/0:4,0:4:12:0,12,173    0/0:5,0:5:15:0,15,215    0/0:9,0:9:21:0,21,315    0/0:4,0:4:12:0,12,167    0/0:4,0:4:9:0,9,135    0/0:3,0:3:6:0,6,90    0/0:5,0:5:15:0,15,221    0/0:2,0:2:6:0,6,90    0/0:3,0:3:9:0,9,128    0/0:5,0:5:15:0,15,216    0/0:3,0:3:9:0,9,132    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,169    0/0:3,0:3:9:0,9,122    0/0:5,0:5:12:0,12,180    0/0:3,0:3:6:0,6,90
174    .    C    .    .    PASS    AN=90;DP=173;NCC=1    GT:AD:DP    0/0:3:3    0/0:4:4    0/0:2:2    0/0:4:4    0/0:4:4    0/0:4:4    0/0:8:8    0/0:2:2    0/0:1:1    ./.    0/0:0:1    0/0:2:5    0/0:2:5    0/0:1:1    0/0:1:2    0/0:0:2    0/0:0:2    0/0:0:2    0/0:6:6    0/0:9:9    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:10:10    0/0:4:4    0/0:5:5    0/0:9:9    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
175    .    TGCTGC    .    43.94    PASS    AC=1;AF=0.013;AN=76;BaseQRankSum=-7.360e-01;ClippingRankSum=0.736;DP=181;FS=0.000;GQ_MEAN=11.50;GQ_STDDEV=11.94;InbreedingCoeff=0.0019;MLEAC=1;MLEAF=0.013;MQ=60.00;MQ0=0;MQRankSum=0.736;NCC=8;QD=8.79;ReadPosRankSum=0.736;SOR=1.179    GT:AD:DP:GQ:PL    0/0:3,0:3:6:0,6,90    0/0:4,0:4:12:0,12,161    0/0:2,0:2:3:0,3,45    0/0:4,0:4:9:0,9,135    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,168    0/0:8,0:8:15:0,15,225    0/0:2,0:2:3:0,3,45    0/0:1,0:1:3:0,3,45    ./.    ./.    ./.    ./.    ./.    0/0:1,1:2:0:0,0,1    ./.    ./.    ./.    0/0:6,0:6:12:0,12,180    0/0:9,0:9:18:0,18,270    0/0:5,0:5:12:0,12,180    0/0:6,0:6:6:0,6,90    0/0:1,0:1:3:0,3,40    0/0:4,0:4:12:0,12,172    0/0:2,0:2:6:0,6,82    0/0:6,0:6:15:0,15,225    0/0:2,0:2:6:0,6,85    0/0:10,0:10:18:0,18,270    0/0:4,0:4:12:0,12,171    0/0:5,0:5:15:0,15,216    0/0:9,0:9:21:0,21,315    0/0:4,0:4:12:0,12,177    0/0:4,0:4:9:0,9,135    0/0:3,0:3:6:0,6,90    0/0:5,0:5:15:0,15,211    0/0:2,0:2:6:0,6,90    0/0:3,0:3:9:0,9,134    0/0:5,0:5:15:0,15,214    0/0:3,0:3:9:0,9,130    0/0:4,0:4:9:0,9,135    0/0:4,0:4:12:0,12,173    0/0:3,0:3:9:0,9,130    0/0:5,0:5:12:0,12,180    0/0:3,0:3:6:0,6,90
176    .    G    .    .    PASS    AN=90;DP=170;NCC=1    GT:AD:DP    0/0:3:3    0/0:4:4    0/0:2:2    0/0:4:4    0/0:4:4    0/0:4:4    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:0:1    0/0:2:5    0/0:2:5    0/0:1:1    0/0:0:1    0/0:0:2    0/0:0:2    0/0:0:2    0/0:6:6    0/0:8:8    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:9:9    0/0:4:4    0/0:5:5    0/0:9:9    0/0:4:4    0/0:4:4    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
177    .    C    .    .    PASS    AN=88;DP=169;NCC=2    GT:AD:DP    0/0:3:3    0/0:4:4    0/0:2:2    0/0:4:4    0/0:4:4    0/0:4:4    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:0:1    0/0:0:3    0/0:0:4    ./.    0/0:0:1    0/0:0:2    0/0:0:2    0/0:0:2    0/0:6:6    0/0:8:8    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:9:9    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
178    .    T    .    .    PASS    AN=88;DP=169;NCC=2    GT:AD:DP    0/0:3:3    0/0:5:5    0/0:2:2    0/0:4:4    0/0:4:4    0/0:5:5    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:1:1    0/0:3:3    0/0:3:4    ./.    0/0:1:1    0/0:2:2    0/0:2:2    0/0:2:2    0/0:5:5    0/0:8:8    0/0:5:5    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:8:8    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
179    .    G    .    .    PASS    AN=88;DP=168;NCC=2    GT:AD:DP    0/0:2:2    0/0:5:5    0/0:2:2    0/0:4:4    0/0:4:4    0/0:5:5    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:1:1    0/0:3:3    0/0:4:4    ./.    0/0:1:1    0/0:2:2    0/0:2:2    0/0:2:2    0/0:5:5    0/0:7:7    0/0:6:6    0/0:6:6    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:8:8    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3
180    .    C    .    .    PASS    AN=88;DP=170;NCC=2    GT:AD:DP    0/0:2:2    0/0:5:5    0/0:2:2    0/0:4:4    0/0:4:4    0/0:5:5    0/0:7:7    0/0:2:2    0/0:1:1    ./.    0/0:1:1    0/0:3:3    0/0:3:4    ./.    0/0:1:1    0/0:2:2    0/0:3:3    0/0:4:4    0/0:5:5    0/0:7:7    0/0:6:6    0/0:5:5    0/0:1:1    0/0:4:4    0/0:2:2    0/0:6:6    0/0:2:2    0/0:8:8    0/0:4:4    0/0:6:6    0/0:10:10    0/0:4:4    0/0:5:5    0/0:4:4    0/0:5:5    0/0:2:2    0/0:3:3    0/0:5:5    0/0:3:3    0/0:4:4    0/0:4:4    0/0:3:3    0/0:5:5    0/0:3:3

I apologize in advance, if I am doing something incorrect in the command line. If you would please advise, I'd very much appreciate it.

ADD REPLY
0
Entering edit mode

My bad: all your variants are 0/0 (homref) and not 'uncalled'. You could quickly remove those variants by piping the vcf to https://github.com/lindenb/jvarkit/wiki/VCFFilterJS

java -jar dist/vcffilterjs.jar -e 'variant.getAlternateAlleles().size()>0'
ADD REPLY
0
Entering edit mode

Thank you for your response. Things aren't quite working, I tried both with the -e commands and without. Please see below.

But just to be sure to clarify, I would like to keep all sites in the file (i.e. not remove the 0/0 sites), I simply want to recode the REF, ALT, and INFO appropriately to reflect the data currently in the "new" vcf once I remove the rouge individual.

mcgaughs@labq01 [/panfs/roc/groups/14/mcgaughs/mcgaughs/tools/jvarkit] % java -jar /panfs/roc/groups/14/mcgaughs/mcgaughs/tools/jvarkit/dist-1.128/vcffilterjs.jar -f /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf -e 'variant.getAlternateAlleles().size()>0'
[INFO/VCFFilterJS] 2015-05-13 20:33:25 "Starting JOB at Wed May 13 20:33:25 CDT 2015 com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS version=840f289630f04c24db877d06f90404ff7c2b9639  built=2015-05-13:20-05-18"
[INFO/VCFFilterJS] 2015-05-13 20:33:25 "Command Line args : -f /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf -e variant.getAlternateAlleles().size()>0"
[INFO/VCFFilterJS] 2015-05-13 20:33:25 "Executing as mcgaughs@labq01.msi.umn.edu on Linux 2.6.32-504.16.2.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_79-mockbuild_2015_04_15_00_02-b00"
[SEVERE/VCFFilterJS] 2015-05-13 20:33:25 "both javascript file/expr are set"
[SEVERE/VCFFilterJS] 2015-05-13 20:33:25 "Initialization of VCFFilterJS failed."
[INFO/VCFFilterJS] 2015-05-13 20:33:25 "End JOB status=-1 [Wed May 13 20:33:25 CDT 2015] com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS done. Elapsed time: 0.00 minutes."
[SEVERE/VCFFilterJS] 2015-05-13 20:33:25 "##### ERROR: return status = -1################"
mcgaughs@labq01 [/panfs/roc/groups/14/mcgaughs/mcgaughs/tools/jvarkit] % java -jar /panfs/roc/groups/14/mcgaughs/mcgaughs/tools/jvarkit/dist-1.128/vcffilterjs.jar -f /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf 
[INFO/VCFFilterJS] 2015-05-13 20:33:38 "Starting JOB at Wed May 13 20:33:38 CDT 2015 com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS version=840f289630f04c24db877d06f90404ff7c2b9639  built=2015-05-13:20-05-18"
[INFO/VCFFilterJS] 2015-05-13 20:33:38 "Command Line args : -f /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf"
[INFO/VCFFilterJS] 2015-05-13 20:33:39 "Executing as mcgaughs@labq01.msi.umn.edu on Linux 2.6.32-504.16.2.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_79-mockbuild_2015_04_15_00_02-b00"
[INFO/VCFFilterJS] 2015-05-13 20:33:39 "Compiling /panfs/roc/scratch/mcgaugh/VCFs/PASS_SNP_invariant_INDELTEST2.vcf"
[SEVERE/VCFFilterJS] 2015-05-13 20:33:39 "sun.org.mozilla.javascript.EvaluatorException: illegal character (<Unknown Source>#1)"
javax.script.ScriptException: sun.org.mozilla.javascript.EvaluatorException: illegal character (<Unknown Source>#1)
    at com.sun.script.javascript.RhinoScriptEngine.compile(RhinoScriptEngine.java:392)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.AbstractVcfJavascript.initializeKnime(AbstractVcfJavascript.java:142)
    at com.github.lindenb.jvarkit.knime.AbstractKnimeApplication.mainWork(AbstractKnimeApplication.java:86)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS.doWork(VCFFilterJS.java:152)
    at com.github.lindenb.jvarkit.util.AbstractCommandLineProgram.instanceMain(AbstractCommandLineProgram.java:496)
    at com.github.lindenb.jvarkit.util.AbstractCommandLineProgram.instanceMainWithExit(AbstractCommandLineProgram.java:510)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS.main(VCFFilterJS.java:159)
Caused by: sun.org.mozilla.javascript.EvaluatorException: illegal character (<Unknown Source>#1)
    at sun.org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.java:109)
    at sun.org.mozilla.javascript.DefaultErrorReporter.error(DefaultErrorReporter.java:96)
    at sun.org.mozilla.javascript.Parser.addError(Parser.java:146)
    at sun.org.mozilla.javascript.TokenStream.getToken(TokenStream.java:825)
    at sun.org.mozilla.javascript.Parser.peekToken(Parser.java:172)
    at sun.org.mozilla.javascript.Parser.parse(Parser.java:384)
    at sun.org.mozilla.javascript.Parser.parse(Parser.java:359)
    at sun.org.mozilla.javascript.Context.compileImpl(Context.java:2370)
    at sun.org.mozilla.javascript.Context.compileReader(Context.java:1321)
    at sun.org.mozilla.javascript.Context.compileReader(Context.java:1293)
    at com.sun.script.javascript.RhinoScriptEngine.compile(RhinoScriptEngine.java:388)
    ... 6 more
[SEVERE/VCFFilterJS] 2015-05-13 20:33:39 "Initialization of VCFFilterJS failed."
[INFO/VCFFilterJS] 2015-05-13 20:33:39 "End JOB status=-1 [Wed May 13 20:33:39 CDT 2015] com.github.lindenb.jvarkit.tools.vcffilterjs.VCFFilterJS done. Elapsed time: 0.00 minutes."
[SEVERE/VCFFilterJS] 2015-05-13 20:33:39 "##### ERROR: return status = -1################"
ADD REPLY
0
Entering edit mode

you're not using those tools the correct way. you should do something like:

cat your.vcf | vcfcutsamples [options] | vcffilterjs [options] > output.vcf

Furthermore, if all your sites are 0/0 or undefined (./.) there will be no ALT alleles.

ADD REPLY
0
Entering edit mode

Thank you for your response. I have your code working as above. Unfortunately, this is not the solution I wanted, as it removes the invariant sites.

I still want all invariant sites included in my file. VCFtools and VCFlib leave a signature of the removed sample in the ALT and REF columns, even if the remaining individuals do not have the alternative allele and all remaining individuals are 0/0 or ./. These tools may still provide something other than '.' in the ALT column. I simply want to remove one individual from a vcf file, and have the ALT, REF, and INFO recoded as if I had made the original vcf file without that individual.

vcfcutsamples.jar almost does this, but it leaves the original deletions relative to the reference from the individual that was removed denoted in the REF tab (as shown in my examples above, everyone is 0/0 or ./. but the REF allele shows the deletion in the sample that was removed). I can work with that and just write some code to deal with it downstream.

I'd be curious to know if there is actually software out there that recodes everything properly, though because it would be easier overall.

Thank you for your help.

ADD REPLY
1
Entering edit mode
9.5 years ago

my tool https://github.com/lindenb/jvarkit/wiki/VcfCutSamples remove the unused ALT alleles:

$ gunzip -c input.vcf.gz  |\
java -jar dist/vcffilterjs.jar -e 'variant.getAlternateAlleles().size()>2'  |\
java -jar dist/vcfcutsamples.jar -S B00GG81  | grep -v '#' | cut -f 4,5,10 | head

CTTTTT    CT    0/1:0,1,2,0,0:7:0:132,47,96,0,20,15,82,45,0,73,52,35,0,51,43
TGGG    TGG,T    2/1:0,1,1,1:9:5:49,33,55,5,30,65,38,27,0,34
CAAAAAAAAAAA    C,CAAAAAAAAA    1/2:0,0,5,1,0:10:22:630,645,689,32,42,177,516,517,0,499,586,588,22,515,570
CGTGTGT    .    0/0:2,0,0,0:4:6:0,6,77,6,83,166,6,80,90,84
TTGTG    TTGTGTGTG    1/1:0,0,0,0,1:3:3:46,32,29,52,35,128,52,35,106,101,3,3,6,6,0
TG    TGGGGG    1/1:0,0,0,1,0:2:3:46,17,14,49,17,57,3,3,3,0,13,12,13,3,10
GTCTC    GTC    1/1:0,0,3,0:5:9:128,132,138,9,9,0,137,147,9,188
CAAA    CA    0/1:3,1,0,0:14:14:14,0,141,35,165,294,16,79,110,84
AGCCGCCGCC    .    0/0:46,0,0,0:92:99:0,188,6588,138,4981,4841,138,3895,3831,3706
CAAA    CAA    0/1:7,3,6,2,18:80:71:350,366,682,275,439,1309,453,695,1472,2289,0,71,394,532,376
ADD COMMENT

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6