Modify IOT formated VCF by the most frequent variant
1
0
Entering edit mode
6.4 years ago

I have a VCF from the IOT server analyzer which look like (I paste just one entry for a call of a variant in KRAS gene):

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT
chr12   25398280 COSM536;COSM535;COSM12655;COSM537;COSM12721;COSM531;COSM87280;COSM532;COSM528;COSM523;COSM521;COSM25081;COSM516;COSM518;COSM517;COSM511    GCCACCAG    AACACCAG,ACCACCAG,ATCACCAG,CCCACCAG,GCCACAAG,GCCACCAA,GCCACCACCAG,GCCACGAG,GCCACTAG,GCCATAAG,GCCATCAG,GCCGCCAG,GCTACCAG,GTCACCAG,TCCACCAG,TTCACCAG  37011.90    PASS    AF=0,0.000308356,0,0,0.692569,0.00185014,0,0.000616713,0.000616713,0.00185014,0.00308356,0.000308356,0.000308356,0.0021585,0,0;AO=0,1,0,0,2241,6,0,2,2,6,10,1,1,7,0,0;DP=3294;FAO=0,1,0,0,2246,6,0,2,2,6,10,1,1,7,0,0;FDP=3243;FDVR=5,5,5,10,5,10,10,5,5,5,10,5,5,10,5,5;FR=.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.;FRO=961;FSAF=0,0,0,0,1153,3,0,0,0,4,3,1,0,1,0,0;FSAR=0,1,0,0,1093,3,0,2,2,2,7,0,1,6,0,0;FSRF=500;FSRR=461;FUNC=[{'origPos':'25398280','origRef':'GCCACCAG','normalizedRef':'C','gene':'KRAS','normalizedPos':'25398285','normalizedAlt':'A','polyphen':'1.0','gt':'pos','codon':'TGT','coding':'c.34G>T','sift':'0.01','grantham':'159.0','transcript':'NM_033360.3','function':'missense','protein':'p.Gly12Cys','location':'exonic','origAlt':'GCCACAAG','exon':'2'}];FWDB=0.00636207,-0.0290886,-0.0202011,-0.0039864,-0.00377511,-0.0316869,-0.00452498,-0.00452004,-0.0474076,-0.0194318,-0.0258133,-0.00942904,-0.0131078,-0.0130281,-0.0816905,-0.0359171;FXX=0.0154826;HRUN=1,1,1,1,2,1,0,2,2,2,2,1,2,2,1,1;HS;HS_ONLY=0;LEN=2,1,2,1,1,1,3,1,1,2,1,1,1,1,1,2;MLLD=152.15,93.9388,182.619,378.816,84.0614,371.393,214.047,102.979,41.516,59.0083,314.834,99.3045,232.878,156.428,143.283,201.003;OALT=AA,A,AT,C,A,A,ACC,G,T,TA,T,G,T,T,T,TT;OID=COSM12721,COSM536,COSM531,COSM535,COSM516,COSM511,COSM12655,COSM518,COSM517,COSM25081,COSM521,COSM523,COSM528,COSM532,COSM537,COSM87280;OMAPALT=AACACCAG,ACCACCAG,ATCACCAG,CCCACCAG,GCCACAAG,GCCACCAA,GCCACCACCAG,GCCACGAG,GCCACTAG,GCCATAAG,GCCATCAG,GCCGCCAG,GCTACCAG,GTCACCAG,TCCACCAG,TTCACCAG;OPOS=25398280,25398280,25398280,25398280,25398285,25398287,25398283,25398285,25398285,25398284,25398284,25398283,25398282,25398281,25398280,25398280;OREF=GC,G,GC,G,C,G,-,C,C,CC,C,A,C,C,G,GC;PB=.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.;PBP=.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.;QD=45.6515;RBI=0.0125377,0.0362959,0.0427309,0.0191885,0.00404772,0.0337496,0.0108564,0.0228843,0.0495759,0.0198897,0.034311,0.0155901,0.0342593,0.0226423,0.0821846,0.0367961;REFB=0.00124768,0.0020683,0.00140984,0.00105771,0.0465428,0.00133263,-0.00173314,0.011244,0.0243957,0.0239083,0.00126847,-0.00176923,0.00355831,0.00138268,0.000713104,-0.000788604;REVB=-0.0108036,-0.0217082,-0.0376543,0.0187698,0.00146033,0.0116179,0.00986839,-0.0224334,-0.0145013,-0.0042432,-0.0226035,-0.0124155,-0.0316525,-0.0185187,-0.008999,-0.00799476;RO=959;SAF=0,0,0,0,1149,3,0,0,0,4,3,1,0,1,0,0;SAR=0,1,0,0,1092,3,0,2,2,2,7,0,1,6,0,0;SRF=498;SRR=461;SSEN=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;SSEP=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;SSSB=0,-0.0228342,0,0,-0.00370131,-0.00454445,0,-0.0445576,-0.0445576,0.0347783,-0.0789845,0.021177,-0.0228342,-0.100903,0,0;STB=0.5,0.990596,0.5,0.5,0.50208,0.520099,0.5,0.995243,0.995243,0.647057,0.714502,0.989819,0.990596,0.864521,0.5,0.5;STBP=1,0.307,1,1,0.728,0.931,1,0.106,0.106,0.516,0.221,0.484,0.307,0.076,1,1;TYPE=mnp,snp,mnp,snp,snp,snp,ins,snp,snp,mnp,snp,snp,snp,snp,snp,mnp;VARB=0,0.0932764,0,0,-0.0201705,0.025506,0,0.0347318,-0.000363269,0.0362913,0.0507715,0.0223018,0.0253235,0.0371567,0,0 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/5:10795:3294:3243:959:961:0,1,0,0,2241,6,0,2,2,6,10,1,1,7,0,0:0,1,0,0,2246,6,0,2,2,6,10,1,1,7,0,0:0,0.000308356,0,0,0.692569,0.00185014,0,0.000616713,0.000616713,0.00185014,0.00308356,0.000308356,0.000308356,0.0021585,0,0:0,1,0,0,1092,3,0,2,2,2,7,0,1,6,0,0:0,0,0,0,1149,3,0,0,0,4,3,1,0,1,0,0:498:461:0,1,0,0,1093,3,0,2,2,2,7,0,1,6,0,0:0,0,0,0,1153,3,0,0,0,4,3,1,0,1,0,0:500:461

As you may see, for every variant detected (I don't know how) IOT analyzer reports all (or some, I'm not sure) variants from COSMIC and report, for each possible variant its AF. However, as you may see, there is only one COSMIC variant which has enough AF to call (0.692569). I would like to get rid of the rest of the variants. Ideally, I would like to filter also the values for the rest of the ID for the every call but any advice on any step would be kindly appreciated :)

Thanks!

EDIT: As suggested, here is an example of a VCF file:

vcf snp R bash • 1.8k views
ADD COMMENT
1
Entering edit mode

Hello, this is not a valid VCF format, check here for the right VCF format.

ADD REPLY
0
Entering edit mode

only one COSMIC variant which has enough AF to call (0.692569).

you mean only one COSMIC ALT allele

I would like to get rid of the rest of the variants.

what does it mean ? you want to remove the values: you cannot dot it because the INFO format should use Number=A -> one value per ALT allele.

Ideally, I would like to filter also the values

provide an example of output.

ADD REPLY
0
Entering edit mode

Hi Pierre!

Thanks for the formatting.

I would like to get rid of the rest of the variants.

Means that I only want to keep only one COSMIC ALT allele (the one with bigger AF). I wasn't aware of the Number=A.

Here's an output example:

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT
chr12   25398280 COSM537    GCCACCAG    CCCACCAG  37011.90    PASS    AF=0.692569;AO=...

Thanks!

ADD REPLY
0
Entering edit mode

1) it cannot be a valid VCF, if there is a FORMAT column, then one expect some genotypes/sample names after FORMAT. 2) put a valid snippet of input vcf on gist.github.com please

ADD REPLY
0
Entering edit mode

Thanks for your help! As you may suppose, managing vcf files is quite new to me... Here's an input example:

ADD REPLY
1
Entering edit mode
6.4 years ago

I kind of solve it using vcfR package. If anyone would need to do this in the future, here is the issue:

https://github.com/knausb/vcfR/issues/112

ADD COMMENT

Login before adding your answer.

Traffic: 1595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6