extract and generate a VCF file with only "PASS"
7
2
Entering edit mode
8.3 years ago
zengtony743 ▴ 80
I have a VCF file and I want to generate a new VCF file with the variants which have only FILTER as "PASS" left

I tried

 1) grep 4751snpf.vcf > 4751PASS.vcf | grep “FILTER=PASS”
 2) grep 4751snpf.vcf > 4751PASS.vcf | grep “PASS”

They both failed. Here is the VCF file looks like:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  MO_4751

chr10   3124905 .   A   C   7797.30 PASS    ABHet=0.588;ABHom=0.995;
AC=1;AF=0.500;AN=2;BaseQRankSum=-4.919;DP=8;Dels=0.00;FS=7.563;HaplotypeScore=0.
4537;InbreedingCoeff=-0.5755;MQ=55.56;MQ0=1;MQRankSum=-4.937;OND=3.265e-03;QD=12
.97;ReadPosRankSum=5.683    GT:AD:DP:GQ:PL  0/1:7,1:8:9:9,0,228
chr10   3176529 .   G   T   6361.10 PASS    ABHet=0.651;ABHom=0.864;
AC=1;AF=0.500;AN=2;BaseQRankSum=-9.379;DP=14;Dels=0.01;FS=0.292;HaplotypeScore=5
.7125;InbreedingCoeff=-0.7938;MQ=55.81;MQ0=0;MQRankSum=-7.360;OND=0.035;QD=8.60;
ReadPosRankSum=-2.204   GT:AD:DP:GQ:PL  0/1:9,5:14:99:109,0,266
chr10   3257558 .   A   C   349.31  LowCoverage ABHet=0.543;ABHo
m=1.00;AC=1;AF=0.500;AN=2;BaseQRankSum=-1.917;DP=3;Dels=0.00;FS=1.412;HaplotypeS
core=0.4039;InbreedingCoeff=0.0676;MQ=54.44;MQ0=0;MQRankSum=-2.510;QD=15.88;Read
PosRankSum=-1.127   GT:AD:DP:GQ:PL  0/1:2,1:3:26:26,0,63
chr10   3505793 .   A   C   406.07  LowCoverage ABHet=0.556;ABHo
m=0.885;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.055;DP=2;Dels=0.00;FS=136.819;Haploty
peScore=0.2675;InbreedingCoeff=-0.1981;MQ=48.36;MQ0=0;MQRankSum=-3.192;OND=0.090
;QD=7.66;ReadPosRankSum=-1.637  GT:AD:DP:GQ:PL  0/1:1,1:2:21:21,0,27
vcf filter • 20k views
ADD COMMENT
0
Entering edit mode

Just found that vcftool has --remove-filtered-all. Has anyone tried it?

ADD REPLY
11
Entering edit mode
5.4 years ago
$ bcftools view -i 'ID="PASS"' input.vcf > output.vcf
ADD COMMENT
6
Entering edit mode

I like bcftools, it is so fast and can handel larege vcf files from whole genome sequencing (WGS) with high efficiency. However, the bcftools option -i 'ID="PASS"' will not work for all versions of vcf files, at least it did not work with me with illumenia WGS SNV.vcf files. To make sure it will work with all types of vcf files, I would use the following command instead:

bcftools view -f PASS input.vcf > output.vcf -Ahmed Alhendi

ADD REPLY
5
Entering edit mode
8.3 years ago
Vivek ★ 2.7k

You are grepping the file name instead of the pattern and writing it to another file before using a pipe to grep for the pattern you want. Try this command instead.

awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print}' 4751snpf.vcf > 4751PASS.vcf

Edited to add an apastrophe.

ADD COMMENT
0
Entering edit mode

Thanks, Vivek. I am running the one that you posted It shows this, Is it normal?

$ awk -F '\t' '{if($0 ~ /#/) print; else if($7 == "PASS") print} 4751snpf.vcf > 4751PASS1.vcf &

ADD REPLY
3
Entering edit mode

There's an apostrophe missing between "print}" and "4751". I think the correct syntax is:

awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print}' 4751snpf.vcf > 4751PASS1.vcf
ADD REPLY
0
Entering edit mode

Ah thank you, I typed that post right before leaving home and didn't get a chance to check the error.

ADD REPLY
2
Entering edit mode
8.3 years ago
spvensko ▴ 240

I believe this should work:

awk '$7=="PASS" {print $0}' 4751snpf.vcf > 4751PASS.vcf
ADD COMMENT
0
Entering edit mode

Thanks, spvensko, it works by producing the following file, however, all information on the header line lost too

chr10   3176529 .   G   T   6361.10 PASS    ABHet=0.651;ABHom=0.864;
AC=1;AF=0.500;AN=2;BaseQRankSum=-9.379;DP=14;Dels=0.01;FS=0.292;HaplotypeScore=5
.7125;InbreedingCoeff=-0.7938;MQ=55.81;MQ0=0;MQRankSum=-7.360;OND=0.035;QD=8.60;
ReadPosRankSum=-2.204   GT:AD:DP:GQ:PL  0/1:9,5:14:99:109,0,266
chr10   3696700 .   T   G   988.21  PASS    ABHet=0.604;ABHom=0.869;
AC=1;AF=0.500;AN=2;BaseQRankSum=1.305;DP=5;Dels=0.00;FS=387.692;HaplotypeScore=0
.3825;InbreedingCoeff=-0.3615;MQ=49.17;MQ0=0;MQRankSum=-5.599;OND=0.066;QD=5.95;
ReadPosRankSum=-7.691   GT:AD:DP:GQ:PL  0/1:3,2:5:44:44,0,61
chr10   4033401 .   G   A,T 16302.20    PASS    AC=1,0;AF=0.500,
0.00;AN=2;BaseQRankSum=-7.317;DP=34;Dels=0.00;FS=235.717;HaplotypeScore=12.5074;
InbreedingCoeff=-0.4964;MQ=55.46;MQ0=1;MQRankSum=-29.193;QD=6.18;ReadPosRankSum=
-5.443  GT:AD:DP:GQ:PL  0/1:27,7,0:34:99:111,0,818,186,836,1023
chr10   4403027 .   A   G   3098.16 PASS    ABHet=0.673;ABHom=0.977;
AC=1;AF=0.500;AN=2;BaseQRankSum=6.290;DP=10;Dels=0.00;FS=0.000;HaplotypeScore=6.
4682;InbreedingCoeff=-0.4427;MQ=49.74;MQ0=0;MQRankSum=5.366;OND=8.775e-03;QD=7.9
4;ReadPosRankSum=-9.160 GT:AD:DP:GQ:PL  0/1:8,2:10:42:42,0,149
ADD REPLY
1
Entering edit mode

You should follow Vivek's approach (A: extract and generate a VCF file with only "PASS") as it maintains the header lines.

ADD REPLY
0
Entering edit mode

Thanks, Vivek. I am running the one that you posted It shows this, Is it normal?

$ awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print} 4751snpf.vcf > 4751PASS1.vcf &
> 
> 
> 
> 
> 
> 
> 
> 
>
ADD REPLY
1
Entering edit mode
4.9 years ago

Hello,
You can try the below GATK command to filter variants by 'PASS':

gatk --java-options '-Xmx20G -XX:+UseParallelGC -XX:ParallelGCThreads=8' SelectVariants -R reference.fasta -V snps.vcf.gz --exclude-filtered true -O filter.vcf.gz

Alternatively, you can use vcffilter tools:

vcffilter -f "FILTER = PASS" snps.vcf.gz > filter.vcf.gz

Hope this works

-Lakshman

ADD COMMENT
1
Entering edit mode
4.7 years ago
Ahmed Alhendi ▴ 240

If you only want to filter for PASS records then you don't need to use 'AWK '. This simply can be done using bcftools.

bcftools view -f PASS input.vcf > output.vcf
ADD COMMENT
0
Entering edit mode
5.4 years ago
osowiecki • 0
$ zcat ./4_strelka.vcf.gz | grep -P "\tPASS\t|^#" > test
ADD COMMENT
0
Entering edit mode
4.4 years ago

vcftools --remove-filtered-all --recode 4751snpf.vcf > 4751PASS.vcf

ADD COMMENT

Login before adding your answer.

Traffic: 1512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6