Hello,
I just have a query that ,is it feasible to split the multi sample annotated vcf file for a particular protein (samples are from 1000g) into separate vcf files for clear view because what I need to get is only the mutations and the samples in which they are present . I am very new to the vcf file format .I have annotated the file with vep-ensembel. Now I need to analyze that file for mutation in each sample. Can you kindly give me the way where I can get the sample names for each mutation in a much sorted manner as this annotated file is very huge .Any script or command to filter the file in this way .
Thanks and regards
Duplicate of Per sample information from a multi sample vcf file
something like:
?
After doing this , my file shows all 2548 variants for one position in the chromosome that is not possible as all the samples do not have variation at a single position . So I need to remove the reference homozygous that is 0|0 from this file . Kindly give the way to do this . Also for each position of the chromosome there are more than one variation for a single sample that is due to the different splicing patterns so different transcripts .I need to keep only the first mutation for the sample that is relevant to my reference protein sequence and I want to remove others .How will I do that.