filtering a VCF file
2
0
Entering edit mode
10.7 years ago
kanwarjag ★ 1.2k

I have a VCF file which I have filtered using grep command for various flags/ tags. The file is very small around 10000 rows.I can easily open in any txt editor or excel.

Here are few lines:

1    14662255    .    CAG    GAG,C    .    .    NS=2;AN=4;AC=2,1;CGA_XR=.,dbsnp.126|rs34561318;CGA_RPT=(GA)n|Simple_repeat|0.0    GT:PS:SS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP:CGA_ODP:CGA_OAD:CGA_ORDP:CGA_SOMC:CGA_SOMR:CGA_SOMS:CGA_SOMF    1|0:14662255:.:PASS:63:63,287:62,276:14,52:-63,0,-287,-63,-287,-287:-14,0,-52,-14,-52,-52:31:7,24:24:27:13,14:14:.:.:.:.    1|2:14662255:Somatic:SQLOW;FET30:170:170,170:167,167:47,49:-170,-170,-170,-170,0,-170:-49,-49,-49,-47,0,-47:34:21,14:21:34:15,23:23:del:0.146:-17:0
1    17299453    .    TA    T,AA    .    .    NS=2;AN=4;AC=1,1;CGA_FI=9696|NM_014675.3|CROCC|UTR3|UNKNOWN-INC,9696|NM_014675.3|CROCC|UTR3|UNKNOWN-INC    GT:PS:SS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP:CGA_ODP:CGA_OAD:CGA_ORDP:CGA_SOMC:CGA_SOMR:CGA_SOMS:CGA_SOMF    1/0:.:.:PASS:51:51,535:0,485:0,45:-51,0,-535,-51,-535,-535:0,0,-45,0,-45,-45:68:5,53:53:47:1,43:43:.:.:.:.    2/0:.:Somatic:SQLOW;FET30:66:66,348:0,279:0,52:-66,-66,-348,0,-348,-348:0,0,-52,0,-52,-52:44:7,37:37:59:2,56:56:snp:0.012:-20:14

Now I would like to filter for gene names / word (say FET20) Can I do it now non programmatically in a GUI tool or Excel . I see there is information of genotypes which I cannot handle in excel and the columns with multiple types of information.

Is there any easy way of handling filtering these small datasets.

Thanks

VCF-filtering • 3.5k views
ADD COMMENT
2
Entering edit mode

Given the amount of monkeying around you'd probably need to do, why bother? This sounds like a simple job for grep or awk.

ADD REPLY
0
Entering edit mode

The problem is when I use two filters like grep A:B it looses VCF format. Is there a way to mainatin a standard VCF format after grep. The VCF format will allow me to annotate my variation calls in any other standard tool.

ADD REPLY
2
Entering edit mode
10.7 years ago

The suggestions to use grep are good, and a two step grep will maintain the VCF header in your output for downstream processing:

grep ^# orig.vcf > filtered.vcf
grep -v ^# orig.vcf | grep FET20 >> filtered.vcf
ADD COMMENT
1
Entering edit mode
10.7 years ago
Pablo ★ 1.9k

You can use "SnpSift filter" for filtering VCF files. But if the filter requirement is so simple, grep can do the trick as well...

ADD COMMENT

Login before adding your answer.

Traffic: 1579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6