Hello!
I have received some custom vcfs (yes, I know, and I hate it) containing information of each sample per row, independent of variant. This means that some variants are repeated.
I have the following columns:
columns 1-17
#CHROM POS GENE REF ALT QUAL FILTER INFO PTID AF AQ GT DP AD AB GQ PL
GT is genotype, AF is allele frequency, DP is depth and AB is the fraction of each allele. I don`t care for extra information except those that do define the frequency or genotype of alleles.
So for some variants:
PTID
VARIANT A SAMPLE1
VARIANT A SAMPLE2
etc
Where Filter, Qual and Info are empty. PTID is the ID for each sample
What I need is a multisample vcf like
#CHROM POS GENE REF ALT QUAL FILTER INFO SAMPLE1_genotype SAMPLE2_genotype etc
Is there any way of converting this to a multi sample vcf based on the PTID? I have no idea where to start as each variant is defined by pos gene ref and alt (I assume), and I`m worried of messing with the file and getting something wrong without knowing
It would be easier to just have the normal VCF, but the institution that gave this custom VCF is not easily accessible
You have many questions without comments or validation:
I'm not sure what this means, but most questions also did not have many answers. I always give a thumbs up when the answers works so I'm not sure what is the expectation here! Didn't find any very informative description on the forum rules also
You should accept the answers that solve your question and/or follow-up others comments and advices with your experience for anyone come across your posts in the future. These posts come up in the search engines so someone else with the same problem would have easier time figuring out what worked and didn't work for you.