Question

Chromosomal location not matching the mRNA sequence

0

Entering edit mode

2.5 years ago

rutikagavate13 • 0

Hello everyone, I am trying to figure out about rs6165 (ref allele C and alt allele T/A/G) and rs6166 (ref allele C and alt allele T) variants.

When I go for the mRNA rs6165 (c.919G>A) and I try to locate it shows A rather than getting a G.

and similarly, for rs6166 (c.2039G>A) it shows T as a reference, on the transcript mrna (NM_000145.4).

The mutations are of the FSHR gene, is it because the gene is located on the negative strand? But still, I don't understand the purine and pyrimidine change. And all the literature survey on FSHR shows 919A>G (instead of 919G>A) and 2039A>G (not 2039G>A).

The chromosomal position shows C for both.

chromosomal location of rs6165 (Shows C)

chromosomal location of rs6166 (Shows C)

mRNA location of c.919G>A (Shows A)

mRNA location of c.2039G>A (Shows T)

transcription NCBI mutation Variant ClinVar • 543 views

ADD COMMENT • link updated 2.5 years ago by Istvan Albert 102k • written 2.5 years ago by rutikagavate13 • 0

score 0 · Answer 1 · 2022-06-01

The rs6166 (c.2039G>A) nomenclature refers to coding sequence coordinate (note the c there).

That being said HGVS nomenclature is difficult to read and verify without proper tooling - which sadly did not exists ... until the magical bio package came along ... :-)

Let's investigate with bio:

https://www.bioinfo.help/

Get the data:

  bio fetch NM_000145.4 > NM_000145.gb

see what the data contains as FASTA file, it is the entire transcript:

cat NM_000145.gb | bio fasta | head -2

prints:

>NM_000145.4 {"title": "Homo sapiens follicle stimulating hormone receptor (FSHR), transcript variant 1, mRNA", "type": "source"}
AGATCTCTTCTCATAAGGGCACTGTGTGGAGCTTCTGAGATCTGTGGAGGTTTTTCTCTG

note how it says mRNA there, so it is a transcript of some sorts. Now let's print the CDS region only:

cat NM_000145.gb | bio fasta -type CDS | head -2

it now prints:

>NP_000136.2 {"type": "CDS", "gene": "FSHR", "product": "follicle-stimulating hormone receptor isoform 1 precursor", "locus": ""}
ATGGCCCTGCTCCTGGTCTCTTTGCTGGCATTCCTGAGCTTGGGCTCAGGATGTCATCAT

you see how the transcript NM_000145 contains a coding sequence with accession NP_000136. Now let's check position 2039G>A on this coding sequence:

cat NM_000145.gb | bio fasta -type CDS -start 2039 -end 2039

now it prints the G that we were looking for all along:

>NP_000136.2 {"type": "CDS", "gene": "FSHR", "product": "follicle-stimulating hormone receptor isoform 1 precursor", "locus": ""}
G