Question

Confusion : Why and how does a nucleotide in Reference Genome is differenet from the nucleotide in Refseq mRNA ?

1

Entering edit mode

8.1 years ago

gsr9999 ▴ 320

Dear BioStars Leaders,

I was under the impression that the nucleotide sequence of an mRNA from Refseq would match identically to the Reference Genome assembly. Please correct me if my understanding is incorrect.

At this location : chr1:877831 (GRCh37), the reference nucleotide is "T"

https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A877831%2D877831&hgsid=581652983_N7DHUPIeX4dCBoDQHXu8dfrKaa0U

The transcript(gene) at this above genomic location is NM_152486.2(SAMD11).

The genomic coordinate chr1:877831 corresponds to following positions in transcript NM_152486.2:

Transcript Position : 1107 (starts from the first utr)
Codon Position(c) : 1027 (starts from first start codon)

I looked up for the nucleotide at position 1107 in refseq(NM_152486.2), and it is "C".

https://www.ncbi.nlm.nih.gov/nuccore/NM_152486.2?report=gbwithparts&log$=seqview

I have plugged in a mutation "chr1:g.877831T>C" in Mutalyzer(Position Convertor tool) and it translates it to "NM_152486.2:c.1027T>C" https://www.mutalyzer.nl/position-converter?assembly_name_or_alias=GRCh37&description=chr1%3Ag.877831T%3EC

After that, I plugged in the converted transcript position "NM_152486.2:c.1027T>C" in Mutalyzer(Name Checker tool), and then is gives an error "T not found at position 1107, found C instead". https://www.mutalyzer.nl/name-checker?description=NM_152486.2%3Ac.1027T%3EC

I am really confused how the reference genome says the nucleotide is "T" , but the mRNA says it is "C". It would be great if someone could explain this?

Addition to my original question : There is a SNP record for this mutation in dbSNP : rs6672356 https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6672356

Thanks, gsr

Assembly genome sequence gene • 2.5k views

ADD COMMENT • link updated 8.1 years ago by Christian ★ 3.1k • written 8.1 years ago by gsr9999 ▴ 320

0

Entering edit mode

8.1 years ago

Christian ★ 3.1k

Most likely reference genome and RefSeq represent different alleles. Less likely but possible are sequencing errors and RNA editing.

ADD COMMENT • link 8.1 years ago by Christian ★ 3.1k

score 4 · Accepted Answer · 2017-03-03

4

Entering edit mode

8.1 years ago

Petr Ponomarenko ★ 2.8k

RefSeq can be different from the reference genome. These two have a completely different origin. RefSeq is a curated database of transcripts, while reference genomes are results of a genome assembly. Sets of reads/methods are very different. RefSeq is a stable thing, It does not depend on the reference update and this is why HGVS notation for mutation reporting tells to use RefSeq instead of reference genome.

In my practice, I have seen many differences between RefSeq and genomic references. The craziest and most unexpected difference was the difference is single nucleotide delition in the coding region!

ADD COMMENT • link 8.1 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

thank you for your answer

ADD REPLY • link 8.1 years ago by gsr9999 ▴ 320

0

Entering edit mode

I just wanted to add that RefSeq is a curated database of not only transcripts but genomes, proteins as well

ADD REPLY • link 5.6 years ago by ensakz • 0