I have a VCF file which I have filtered using grep command for various flags/ tags. The file is very small around 10000 rows.I can easily open in any txt editor or excel.
Here are few lines:
1 14662255 . CAG GAG,C . . NS=2;AN=4;AC=2,1;CGA_XR=.,dbsnp.126|rs34561318;CGA_RPT=(GA)n|Simple_repeat|0.0 GT:PS:SS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP:CGA_ODP:CGA_OAD:CGA_ORDP:CGA_SOMC:CGA_SOMR:CGA_SOMS:CGA_SOMF 1|0:14662255:.:PASS:63:63,287:62,276:14,52:-63,0,-287,-63,-287,-287:-14,0,-52,-14,-52,-52:31:7,24:24:27:13,14:14:.:.:.:. 1|2:14662255:Somatic:SQLOW;FET30:170:170,170:167,167:47,49:-170,-170,-170,-170,0,-170:-49,-49,-49,-47,0,-47:34:21,14:21:34:15,23:23:del:0.146:-17:0
1 17299453 . TA T,AA . . NS=2;AN=4;AC=1,1;CGA_FI=9696|NM_014675.3|CROCC|UTR3|UNKNOWN-INC,9696|NM_014675.3|CROCC|UTR3|UNKNOWN-INC GT:PS:SS:FT:GQ:HQ:EHQ:CGA_CEHQ:GL:CGA_CEGL:DP:AD:CGA_RDP:CGA_ODP:CGA_OAD:CGA_ORDP:CGA_SOMC:CGA_SOMR:CGA_SOMS:CGA_SOMF 1/0:.:.:PASS:51:51,535:0,485:0,45:-51,0,-535,-51,-535,-535:0,0,-45,0,-45,-45:68:5,53:53:47:1,43:43:.:.:.:. 2/0:.:Somatic:SQLOW;FET30:66:66,348:0,279:0,52:-66,-66,-348,0,-348,-348:0,0,-52,0,-52,-52:44:7,37:37:59:2,56:56:snp:0.012:-20:14
Now I would like to filter for gene names / word (say FET20) Can I do it now non programmatically in a GUI tool or Excel . I see there is information of genotypes which I cannot handle in excel and the columns with multiple types of information.
Is there any easy way of handling filtering these small datasets.
Thanks
Given the amount of monkeying around you'd probably need to do, why bother? This sounds like a simple job for grep or awk.
The problem is when I use two filters like grep A:B it looses VCF format. Is there a way to mainatin a standard VCF format after grep. The VCF format will allow me to annotate my variation calls in any other standard tool.