Question

working with multi sample annotated vcf file

0

Entering edit mode

2.0 years ago

Peerzada • 0

Hello,

I just have a query that ,is it feasible to split the multi sample annotated vcf file for a particular protein (samples are from 1000g) into separate vcf files for clear view because what I need to get is only the mutations and the samples in which they are present . I am very new to the vcf file format .I have annotated the file with vep-ensembel. Now I need to analyze that file for mutation in each sample. Can you kindly give me the way where I can get the sample names for each mutation in a much sorted manner as this annotated file is very huge .Any script or command to filter the file in this way .

Thanks and regards

vcf 1000 genome annotation • 971 views

ADD COMMENT • link 2.0 years ago by Peerzada • 0

0

Entering edit mode

Duplicate of Per sample information from a multi sample vcf file

ADD REPLY • link 2.0 years ago by barslmn ★ 2.3k

0

Entering edit mode

something like:

bcftools query -f '[%CHROM:%POS %INFO/CSQ %SAMPLE %GT\n]' filtered.vcf | awk '($4!="0/0" && $4!="./.")'

?

ADD REPLY • link 2.0 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

After doing this , my file shows all 2548 variants for one position in the chromosome that is not possible as all the samples do not have variation at a single position . So I need to remove the reference homozygous that is 0|0 from this file . Kindly give the way to do this . Also for each position of the chromosome there are more than one variation for a single sample that is due to the different splicing patterns so different transcripts .I need to keep only the first mutation for the sample that is relevant to my reference protein sequence and I want to remove others .How will I do that.

ADD REPLY • link 2.0 years ago by Peerzada • 0