Hello everyone! I am new to manipulating VCF files, and they recommended me the EBIvariation/vcf-validator to check that the file is correctly put. I got from my variant calling (I don't do it, it's the output of the service we pay for) a vcf file which has many repeated info in the INFO field of it, for example:
AA=p.K2811fs46,p.K2811fs46; CDS=c.8426delA,c.8426delA; CNT=1,1
Apparently, having "p.K2811fs*46" twice is not valid, so I should keep only one.
I cannot yet find any tool that does this (not sure if there even is one at all), but any help is very welcome!!!
Hello daianagan,
could you please post the complete header from the vcf file and the first 5-10 variants.
fin swimmer
Sorry, I didn't realize I was answering as a new comment
Hello Fin! Thanks for your reply, here is what you've asked for. I've attached it, since the format when copying here was a mess.
Hello daianagan,
in your example vcf I could not find any repeated information. Do I overlook something? If there are no repeated information for every entry please add some examples which have.
fin swimmer
So sorry about that. It's updated now, the last one has, among others, the AA info duplicated. Thank you!
This is VEP annotated vcf and this example vcf doesn't have OP entries.
Hi cpad! Thank you for your reply. If it is not too much to ask, can you briefly explain to me what a VEP annotated vcf mean? Why would this bring any trouble? Also, what do OP entries are? Thank you!!!
Original VCF was functionally annotated with VEP as the tags in the OP (original post) are inline with VEP output. Duplicate entries you have posted at the start are not present in VCF file you have shared. Apparently, that duplicate p syntax might be due to multiple transcripts being affected by that variation. One needs to be careful before annotating the output.