I have a VCF from the IOT server analyzer which look like (I paste just one entry for a call of a variant in KRAS gene):
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr12 25398280 COSM536;COSM535;COSM12655;COSM537;COSM12721;COSM531;COSM87280;COSM532;COSM528;COSM523;COSM521;COSM25081;COSM516;COSM518;COSM517;COSM511 GCCACCAG AACACCAG,ACCACCAG,ATCACCAG,CCCACCAG,GCCACAAG,GCCACCAA,GCCACCACCAG,GCCACGAG,GCCACTAG,GCCATAAG,GCCATCAG,GCCGCCAG,GCTACCAG,GTCACCAG,TCCACCAG,TTCACCAG 37011.90 PASS AF=0,0.000308356,0,0,0.692569,0.00185014,0,0.000616713,0.000616713,0.00185014,0.00308356,0.000308356,0.000308356,0.0021585,0,0;AO=0,1,0,0,2241,6,0,2,2,6,10,1,1,7,0,0;DP=3294;FAO=0,1,0,0,2246,6,0,2,2,6,10,1,1,7,0,0;FDP=3243;FDVR=5,5,5,10,5,10,10,5,5,5,10,5,5,10,5,5;FR=.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.;FRO=961;FSAF=0,0,0,0,1153,3,0,0,0,4,3,1,0,1,0,0;FSAR=0,1,0,0,1093,3,0,2,2,2,7,0,1,6,0,0;FSRF=500;FSRR=461;FUNC=[{'origPos':'25398280','origRef':'GCCACCAG','normalizedRef':'C','gene':'KRAS','normalizedPos':'25398285','normalizedAlt':'A','polyphen':'1.0','gt':'pos','codon':'TGT','coding':'c.34G>T','sift':'0.01','grantham':'159.0','transcript':'NM_033360.3','function':'missense','protein':'p.Gly12Cys','location':'exonic','origAlt':'GCCACAAG','exon':'2'}];FWDB=0.00636207,-0.0290886,-0.0202011,-0.0039864,-0.00377511,-0.0316869,-0.00452498,-0.00452004,-0.0474076,-0.0194318,-0.0258133,-0.00942904,-0.0131078,-0.0130281,-0.0816905,-0.0359171;FXX=0.0154826;HRUN=1,1,1,1,2,1,0,2,2,2,2,1,2,2,1,1;HS;HS_ONLY=0;LEN=2,1,2,1,1,1,3,1,1,2,1,1,1,1,1,2;MLLD=152.15,93.9388,182.619,378.816,84.0614,371.393,214.047,102.979,41.516,59.0083,314.834,99.3045,232.878,156.428,143.283,201.003;OALT=AA,A,AT,C,A,A,ACC,G,T,TA,T,G,T,T,T,TT;OID=COSM12721,COSM536,COSM531,COSM535,COSM516,COSM511,COSM12655,COSM518,COSM517,COSM25081,COSM521,COSM523,COSM528,COSM532,COSM537,COSM87280;OMAPALT=AACACCAG,ACCACCAG,ATCACCAG,CCCACCAG,GCCACAAG,GCCACCAA,GCCACCACCAG,GCCACGAG,GCCACTAG,GCCATAAG,GCCATCAG,GCCGCCAG,GCTACCAG,GTCACCAG,TCCACCAG,TTCACCAG;OPOS=25398280,25398280,25398280,25398280,25398285,25398287,25398283,25398285,25398285,25398284,25398284,25398283,25398282,25398281,25398280,25398280;OREF=GC,G,GC,G,C,G,-,C,C,CC,C,A,C,C,G,GC;PB=.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.;PBP=.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.;QD=45.6515;RBI=0.0125377,0.0362959,0.0427309,0.0191885,0.00404772,0.0337496,0.0108564,0.0228843,0.0495759,0.0198897,0.034311,0.0155901,0.0342593,0.0226423,0.0821846,0.0367961;REFB=0.00124768,0.0020683,0.00140984,0.00105771,0.0465428,0.00133263,-0.00173314,0.011244,0.0243957,0.0239083,0.00126847,-0.00176923,0.00355831,0.00138268,0.000713104,-0.000788604;REVB=-0.0108036,-0.0217082,-0.0376543,0.0187698,0.00146033,0.0116179,0.00986839,-0.0224334,-0.0145013,-0.0042432,-0.0226035,-0.0124155,-0.0316525,-0.0185187,-0.008999,-0.00799476;RO=959;SAF=0,0,0,0,1149,3,0,0,0,4,3,1,0,1,0,0;SAR=0,1,0,0,1092,3,0,2,2,2,7,0,1,6,0,0;SRF=498;SRR=461;SSEN=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;SSEP=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;SSSB=0,-0.0228342,0,0,-0.00370131,-0.00454445,0,-0.0445576,-0.0445576,0.0347783,-0.0789845,0.021177,-0.0228342,-0.100903,0,0;STB=0.5,0.990596,0.5,0.5,0.50208,0.520099,0.5,0.995243,0.995243,0.647057,0.714502,0.989819,0.990596,0.864521,0.5,0.5;STBP=1,0.307,1,1,0.728,0.931,1,0.106,0.106,0.516,0.221,0.484,0.307,0.076,1,1;TYPE=mnp,snp,mnp,snp,snp,snp,ins,snp,snp,mnp,snp,snp,snp,snp,snp,mnp;VARB=0,0.0932764,0,0,-0.0201705,0.025506,0,0.0347318,-0.000363269,0.0362913,0.0507715,0.0223018,0.0253235,0.0371567,0,0 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR 0/5:10795:3294:3243:959:961:0,1,0,0,2241,6,0,2,2,6,10,1,1,7,0,0:0,1,0,0,2246,6,0,2,2,6,10,1,1,7,0,0:0,0.000308356,0,0,0.692569,0.00185014,0,0.000616713,0.000616713,0.00185014,0.00308356,0.000308356,0.000308356,0.0021585,0,0:0,1,0,0,1092,3,0,2,2,2,7,0,1,6,0,0:0,0,0,0,1149,3,0,0,0,4,3,1,0,1,0,0:498:461:0,1,0,0,1093,3,0,2,2,2,7,0,1,6,0,0:0,0,0,0,1153,3,0,0,0,4,3,1,0,1,0,0:500:461
As you may see, for every variant detected (I don't know how) IOT analyzer reports all (or some, I'm not sure) variants from COSMIC and report, for each possible variant its AF. However, as you may see, there is only one COSMIC variant which has enough AF to call (0.692569). I would like to get rid of the rest of the variants. Ideally, I would like to filter also the values for the rest of the ID for the every call but any advice on any step would be kindly appreciated :)
Thanks!
EDIT: As suggested, here is an example of a VCF file:
Hello, this is not a valid VCF format, check here for the right VCF format.
you mean only one COSMIC ALT allele
what does it mean ? you want to remove the values: you cannot dot it because the INFO format should use
Number=A
-> one value per ALT allele.provide an example of output.
Hi Pierre!
Thanks for the formatting.
Means that I only want to keep only one COSMIC ALT allele (the one with bigger AF). I wasn't aware of the
Number=A
.Here's an output example:
Thanks!
1) it cannot be a valid VCF, if there is a FORMAT column, then one expect some genotypes/sample names after FORMAT. 2) put a valid snippet of input vcf on gist.github.com please
Thanks for your help! As you may suppose, managing vcf files is quite new to me... Here's an input example: