Entering edit mode
7.0 years ago
Vasu
▴
790
Hello,
This is the first time I'm working with VCF file. Data in the VCF file looks like following. I see that in Snpeff software the VCF files has field "ANN". It is not found in my file. I downloaded this file from ICGC. Do I need re-annotate again with Snpeff to get all other information?
#CHROM POS ID REF ALT QUAL FILTER INFO
1 100000409 MU1214865 G A . . CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=G>A;project_count=1;studies=PCAWG;tested_donors=12198
1 100001783 MU4631949 C G . . CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=C>G;project_count=1;studies=PCAWG;tested_donors=12198
1 100003664 MU78268308 C T . . CONSEQUENCE=||||||intergenic_region||,RP11-413P11.1|ENSG00000224445|1|RP11-413P11.1-001|ENST00000438829||upstream_gene_variant||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=C>T;project_count=1;studies=PCAWG;tested_donors=12198
1 100007225 MU4631957 T C . . CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198
1 100008212 MU28770474 T C . . CONSEQUENCE=||||||intergenic_region||;OCCURRENCE=LIRI-JP|1|258|0.00388;affected_donors=1;mutation=T>C;project_count=1;studies=PCAWG;tested_donors=12198
I wanted to check how much percentage of mutations were affecting TF binding sites and motifs. If I need to re-annotate can you please give some ideas how to do it and how I can check the mutations affecting TFBS and motifs.
If you need to add annotation you can use Snpeff, VEP or annovar. Those tools have pretty good documentation so you should be able to figure out how to use them.
It is already annotated. In INFO field, you can see the consequence, gene, transcript, studies, mutation. I guess it is done with one of the ENSEMBL annotators (as gene and transcript are ENS entries). My guess is VEP. If you are looking for more annotation, you can reannotate with VEP again with more flags.
The file looks like already annotated.
Thank you all for the reply. I know it is annotated, but confused with vcf file annotated using Snpeff which has "ANN" field. I would like to check for mutations that affect Transcription factor binding sites. Any idea how to do this and which tool to use?
Have you tried filtering the output with "SnpSift"?
Hello arup,
Yes I'm using SnpSift now for filtering. But I don't see any anything for checking the mutations that affect TF binding sites or motifs. Could you help me in this. after re-annotation data looks like following.