Dear BioStars Leaders,
I was under the impression that the nucleotide sequence of an mRNA from Refseq would match identically to the Reference Genome assembly. Please correct me if my understanding is incorrect.
At this location : chr1:877831 (GRCh37), the reference nucleotide is "T"
The transcript(gene) at this above genomic location is NM_152486.2(SAMD11).
The genomic coordinate chr1:877831 corresponds to following positions in transcript NM_152486.2:
- Transcript Position : 1107 (starts from the first utr)
- Codon Position(c) : 1027 (starts from first start codon)
I looked up for the nucleotide at position 1107 in refseq(NM_152486.2), and it is "C".
https://www.ncbi.nlm.nih.gov/nuccore/NM_152486.2?report=gbwithparts&log$=seqview
I have plugged in a mutation "chr1:g.877831T>C" in Mutalyzer(Position Convertor tool) and it translates it to "NM_152486.2:c.1027T>C" https://www.mutalyzer.nl/position-converter?assembly_name_or_alias=GRCh37&description=chr1%3Ag.877831T%3EC
After that, I plugged in the converted transcript position "NM_152486.2:c.1027T>C" in Mutalyzer(Name Checker tool), and then is gives an error "T not found at position 1107, found C instead". https://www.mutalyzer.nl/name-checker?description=NM_152486.2%3Ac.1027T%3EC
I am really confused how the reference genome says the nucleotide is "T" , but the mRNA says it is "C". It would be great if someone could explain this?
Addition to my original question : There is a SNP record for this mutation in dbSNP : rs6672356 https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6672356
Thanks, gsr
thank you for your answer
I just wanted to add that RefSeq is a curated database of not only transcripts but genomes, proteins as well