working with multi sample annotated vcf file
0
0
Entering edit mode
2.0 years ago
Peerzada • 0

Hello,

I just have a query that ,is it feasible to split the multi sample annotated vcf file for a particular protein (samples are from 1000g) into separate vcf files for clear view because what I need to get is only the mutations and the samples in which they are present . I am very new to the vcf file format .I have annotated the file with vep-ensembel. Now I need to analyze that file for mutation in each sample. Can you kindly give me the way where I can get the sample names for each mutation in a much sorted manner as this annotated file is very huge .Any script or command to filter the file in this way .

Thanks and regards

vcf 1000 genome annotation • 971 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

something like:

bcftools query -f '[%CHROM:%POS %INFO/CSQ %SAMPLE %GT\n]' filtered.vcf | awk '($4!="0/0" && $4!="./.")' 

?

ADD REPLY
0
Entering edit mode

After doing this , my file shows all 2548 variants for one position in the chromosome that is not possible as all the samples do not have variation at a single position . So I need to remove the reference homozygous that is 0|0 from this file . Kindly give the way to do this . Also for each position of the chromosome there are more than one variation for a single sample that is due to the different splicing patterns so different transcripts .I need to keep only the first mutation for the sample that is relevant to my reference protein sequence and I want to remove others .How will I do that.

ADD REPLY

Login before adding your answer.

Traffic: 2082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6