When I go to UCSC table browser and get the BED file for NM_000314.6 (gene = "PTEN"), I find out that the first exon starts at position 89623194 in GRCh37/hg19 (!). Thus the exon starts at 89623194+1 = 89623195 in a 1-based coordinate system.
Now I've created an artificial VCF file:
10 89623195 C A . PASS
1 SNP at position 89623195, the first nucleotide of the first exon. Then I use the following snpEff command to annotate that SNP:
java -Xmx4g -jar ~/snpEff_latest_core/snpEff/snpEff.jar GRCh37.p13.RefSeq artificial.vcf
The ANN record that I get for NM_000314 is the following:
I want to focus on the HGVS nomenclature of that record: c.-1032C>A. If you navigate to the NCBI page of NM_000314 you will see that the coding sequence starts at position 1032 of the transcript. If I had introduced a SNP at position 1031, the expected HGVS nomenclature would be c.-1C>A. If you follow that logic, my introduced SNP at the first nucleotide of exon1 must have the HGVS nomenclature c.-1031C>A. Which is not the case. Is there an error in the snpEff algorithm?
mutalyzer output for chr10:g.89623195C>A (NC_000010.11:g.89623195C>A) and GRCh38:
with GRCh37 (hg19):
I used UCSC hgtables function to export coordinates of NM_000314 using hg19 build, UCSC gene track and known genes track. As you see, CDS start is 89624226.
yes, everything as expected here, c.-1032 is the correct HGVS notation (see the comments below Kevins answer)
Hi Marvin,
There is no need to delete your question, especially when people have provided helpful reactions.