I would like to annotate my VCF with Entrez Gene IDs. I have found ways to add the HGNC Gene Symbol and the Ensemble Gene ID (VEP, Annovar), but not directly to Entrez Gene IDs. I prefer not to translate from the HGNC or Ensemble to Entrez, because I'm afraid information gets lost with this extra translation.
Maybe a BED file with all Entrez Gene IDs might help, since I've found tools to merge annotate VCF files via BED files in Galaxy. Maybe I'm just using the wrong term for Entrez Gene IDs. I mean, for example, the 7157 in http://www.ncbi.nlm.nih.gov/gene/7157.
@wardweistra sorry to comment with a question. I kinder spend all day reading up on variant calling and how to get a causative gene(s) from vcf files. By causative gene I mean the gene that causes a particular phenotype. At this stage I'm trying just to understand the lingo. So by annotating VCF you mean that all SNP (variants) will be assign to a gene (or other feature)? If that's the case will that be a new file or annotation can be held in vcf file? any help is much appreciated.
$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | \
gunzip -c | \
java -jar dist/vcfpeekvcf.jar -f ncbi/snp/organisms/human_9606/VCF/00-All.vcf.gz -t GENEINFO -p NCBI_VCF_ | \
cut -f 1-8 | grep NCBI_VCF_GENEINFO | head
##INFO=<ID=NCBI_VCF_GENEINFO,Number=1,Type=String,Description="Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
22 16260678 rs5746333 G A 100 PASS AA=G|||;AC=3244;AF=0.647764;AFR_AF=0.3888;AMR_AF=0.5634;AN=5008;DP=8520;EAS_AF=0.9673;EUR_AF=0.6133;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.7638;VT=SNP
22 16264717 rs148113506 TA T 100 PASS AA=A|A|-|deletion;AC=2066;AF=0.41254;AFR_AF=0.3858;AMR_AF=0.4265;AN=5008;DP=53564;EAS_AF=0.4196;EUR_AF=0.4274;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.4162;VT=INDEL
22 16265110 rs2212121 C T 100 PASS AA=C|||;AC=416;AF=0.0830671;AFR_AF=0.0045;AMR_AF=0.1744;AN=5008;DP=22443;EAS_AF=0.1667;EUR_AF=0.0219;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.1012;VT=SNP
22 16267558 rs2010682 T C 100 PASS AA=C|||;AC=4111;AF=0.820887;AFR_AF=0.8434;AMR_AF=0.6758;AN=5008;DP=10404;EAS_AF=0.9762;EUR_AF=0.7097;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.8476;VT=SNP
22 16269466 rs2212127 T C 100 PASS AA=C|||;AC=3668;AF=0.732428;AFR_AF=0.6641;AMR_AF=0.6066;AN=5008;DP=2535;EAS_AF=0.9712;EUR_AF=0.6262;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.7771;VT=SNP
22 16269829 rs114833654 T A 100 PASS AA=A|||;AC=4085;AF=0.815695;AFR_AF=0.7186;AMR_AF=0.768;AN=5008;DP=7907;EAS_AF=0.9484;EUR_AF=0.7992;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.8609;VT=SNP
22 16277622 rs2845217 G A 100 PASS AA=A|||;AC=2911;AF=0.58127;AFR_AF=0.3298;AMR_AF=0.5216;AN=5008;DP=5436;EAS_AF=0.9167;EUR_AF=0.5467;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0.6534;VT=SNP
22 16285169 rs192723103 T G 100 PASS AA=T|||;AC=1;AF=0.000199681;AFR_AF=0;AMR_AF=0.0014;AN=5008;DP=23204;EAS_AF=0;EUR_AF=0;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0;VT=SNP
22 16285178 rs184299536 G C 100 PASS AA=G|||;AC=1;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=5008;DP=23166;EAS_AF=0;EUR_AF=0;NCBI_VCF_GENEINFO=POTEH:23784;NS=2504;SAS_AF=0;VT=SNP
@wardweistra sorry to comment with a question. I kinder spend all day reading up on variant calling and how to get a causative gene(s) from vcf files. By causative gene I mean the gene that causes a particular phenotype. At this stage I'm trying just to understand the lingo. So by annotating VCF you mean that all SNP (variants) will be assign to a gene (or other feature)? If that's the case will that be a new file or annotation can be held in vcf file? any help is much appreciated.
p.s I don't know how help this might be, but this tool snpSift seems to do annotation http://snpeff.sourceforge.net/SnpSift.html