How does RefSeq get their transcript sequences?
0
0
Entering edit mode
10.0 years ago
pwg46 ▴ 540

I have been looking at the rna.fa.gz file in RefSeq's database. For the majority of the NM's, the sequences don't start with ATG. So, I thought perhaps the rna file was containing the entire sequence of the mRNA and not just the coding slices. So, I took an NM, the chromosomal CDS start position and he chromosomal first exon start position (all of which I got from another data file provided by Refseq DB) to see where in the NM's sequence the coding region should begin. But even then, still no 'ATG'. Also, when there is a perfect map between an NM and ENST, the NM's sequence given in the rna.fa file is completely different than the ENST's sequence given by Ensembl's own data file--The chromosomal positions of the ENST and NM perfectly match on the same chromosome (and on the same grch38 build), yet somehow the sequences they each give in their own data files are different. Could someone please clarify how RefSeq is coming up with their transcript sequences?

atg sequence refseq nm identifier • 3.8k views
ADD COMMENT
1
Entering edit mode

Here is a link with the detailed process of curating RefSeq transcripts.

ADD REPLY
0
Entering edit mode

There's no reason to expect a transcript sequence to start with ATG, in fact it usually won't. Unless you're looking at non-coding sequences, they should typically contain an ATG, though. Can you give an example of a mismatch between the refseq and corresponding Ensembl sequence for the exact same transcript?

ADD REPLY

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6