When I go to UCSC table browser and get the BED file for NM_000314.6 (gene = "PTEN"), I find out that the first exon starts at position 89623194 in GRCh37/hg19 (!). Thus the exon starts at 89623194+1 = 89623195 in a 1-based coordinate system.
Now I've created an artificial VCF file:
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER
10 89623195 C A . PASS
1 SNP at position 89623195, the first nucleotide of the first exon. Then I use the following snpEff command to annotate that SNP:
java -Xmx4g -jar ~/snpEff_latest_core/snpEff/snpEff.jar GRCh37.p13.RefSeq artificial.vcf
The ANN record that I get for NM_000314 is the following:
A|5_prime_UTR_variant|MODIFIER|PTEN|PTEN|transcript|NM_000314.4|protein_coding|1/9|c.-1032C>A|||||1032|
I want to focus on the HGVS nomenclature of that record: c.-1032C>A. If you navigate to the NCBI page of NM_000314 you will see that the coding sequence starts at position 1032 of the transcript. If I had introduced a SNP at position 1031, the expected HGVS nomenclature would be c.-1C>A. If you follow that logic, my introduced SNP at the first nucleotide of exon1 must have the HGVS nomenclature c.-1031C>A. Which is not the case. Is there an error in the snpEff algorithm?
mutalyzer output for chr10:g.89623195C>A (NC_000010.11:g.89623195C>A) and GRCh38:
with GRCh37 (hg19):
I used UCSC hgtables function to export coordinates of NM_000314 using hg19 build, UCSC gene track and known genes track. As you see, CDS start is 89624226.
yes, everything as expected here, c.-1032 is the correct HGVS notation (see the comments below Kevins answer)
Hi Marvin,
There is no need to delete your question, especially when people have provided helpful reactions.
Cheers,
Wouter