Entering edit mode
6.2 years ago
LimMo
▴
30
Hello all,
I have many VCF files, I need ant tool that can produce/annotate the files and give me the gene names and the number of mutations within each gene in that VCF files.
Any ideas or suggestions will be appreciated.
And googling "annotation vcf" did not bring up any tools?
Yes, I tried some of them, but the main problem that they can't provide the number of mutations per gene.
You can annotate with all of those, and then you just need some unix magic to count the genes, finally piping to
uniq -c
. Something like this:I guess just the grep has to be adapted to suit the annotation format you got.
Have you googled
VCF annotation
? This then suggestssnpeff
,annovar
,vcfanno
, and VEP from Ensembl. I suggest you use the latter. It is quiet powerful.I tried snpeff, annovar and VEP. They all can provide/annotate the VCF files and gave me the gene names but they don't produce the number of mutations per gene.
None of them does AFAIK. Probably because "gene" is quiet a flexible term. Depending on the goal, this can be only the exons, only the coding exons, both introns and exons etc. You'll need to do some custom intersections with the coordinates that are of interest for you. Check out
bedtools intersect.
ok I understand you, I checked
bedtools intersect
quickly, it will help as you said "if I'm interested in a specific region like exons and etc." i.e, help to process the file, but still the question is how can I get the number of mutations within that region?Given that you have a file with the start and end of your genes (genes.bed) and your VCF files:
I tried what you proposed, but I got this error:
Please give examples of all used files. The error indicates malformatted files AFAIK. Do the VCFs have headers?