Hi, can some solve this refseq puzzle for me.
For the gene 0610010B08Rik, on the ucsc browser, it says
RefSeq Gene 0610010B08Rik
RefSeq: NM_001177543.1 Status: Validated
Description: Mus musculus RIKEN cDNA 0610010B08 gene (0610010B08Rik), mRNA.
CCDS: CCDS50826.1
Entrez Gene: 100039060
PubMed on Gene: 0610010B08Rik
PubMed on Product: KRAB box and zinc finger C2H2 type domain containing
Stanford SOURCE: NM_001177543
mRNA/Genomic Alignments
The alignment you clicked on is first in the table below.
BROWSER | SIZE IDENTITY CHROMOSOME STRAND START END QUERY START END TOTAL
-----------------------------------------------------------------------------------------------------
browser | 4539 100.0% 2 - 175192005 175338212 NM_001177543 1 4539 4539
browser | 4538 100.0% 2 - 175419391 175435777 NM_001177543 1 4539 4539
browser | 4538 100.0% 2 + 175640391 175656769 NM_001177543 1 4539 4539
browser | 4538 100.0% 2 + 175737942 175754328 NM_001177543 1 4539 4539
browser | 4538 100.0% 2 - 176470369 176486749 NM_001177543 1 4539 4539
browser | 4538 100.0% 2 - 176619933 176636319 NM_001177543 1 4539 4539
This alignment information is encoded in to the refgene table (mm10) when you pull it from ucsc which means for the gene 0610010B08Rik, there are 6 entries with the same NM id's (same rna). I always collapse the multiple entries to the one longest entry but in this case, for a same gene, there are entries with different strands. How is this possible.
From the refseq method page, its says
RefSeq RNAs were aligned against the mouse genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.1% of the best and at least 96% base identity with the genomic sequence were kept.
Secondly, for my unique list, which entry should I take.
Thanks