Convert Nm_ Mrna Position Into Corresponding Grch37 Genomic Dna Position?
5
4
Entering edit mode
14.0 years ago
Krisr ▴ 470

I have a set of microRNA binding sites that are listed according to their NM_xxxxx mRNA sequence position. Does anyone know of a tool that would allow me to take these mRNA position values and convert them to the corresponding GRCh37 genomic DNA position values - or in other words the genomic DNA position from which it was transcribed?

dna transcript mirna coordinates conversion • 13k views
ADD COMMENT
4
Entering edit mode

This is a clarification rather than an answer. The NM_sequences don't necessarily have a "true" position on GRCh37. They are derived independently of the reference genome, so may or may not map to the genome (and when they do map, they might not map 100%). This means that any means of mapping these NM sequences to the genome is just one possible mapping out of many potential mappings. Also, keep in mind that there are cases where a single NM gene could map to multiple locations on the genome, since the genome has several copies of some genes.

ADD REPLY
0
Entering edit mode

Even if you could get a good mapping from RefSeq (the NM sequences plus some others) to the genome, you would still need to translate your binding site coordinates to their corresponding genome positions. A simple offset might not work if the alignment you're using supported indels. For example, if you have a binding site from positions 10-20 of NM_123, and NM_123 maps to a position on the genome, you might not be able to just "add 9" to the mapping position to find your binding site, because an indel might exist.

ADD REPLY
0
Entering edit mode

If you used some bioinformatics tool to find the binding sites in the first place, it might be easier to run the tool on the genome rather than using the results of running the tool on the RefSeq sequences.

ADD REPLY
5
Entering edit mode
14.0 years ago
Bio_X2Y ★ 4.4k

UCSC provides one possible mapping of RefSeq IDs to genomic coordinates. There are presumably many ways of getting this information - this is one:

  • On the UCSC homepage, click Tables.
  • For assembly, select GRCh37.
  • For track, select RefSeq Genes.
  • For output file, provide a suitable filename.
  • Click get output.

This will provide a file that provides mappings between RefSeq symbols and genomic coordinates.

If you need more information on the exact mapping rules used by UCSC, you will need to do a bit more digging. This thread from the UCSC genome mailing list might be a starting point.

ADD COMMENT
0
Entering edit mode

UCSC does have a coordinate mapping tool, it's called pslMap. You give it a psl file with your coordinates, and the refseq->genome alignment and it will do the conversion for you.

ADD REPLY
2
Entering edit mode
14.0 years ago

If you are a perl user, you might look at this: http://www.bioperl.org/wiki/Module:Bio::Coordinate::GeneMapper

Short of that, so bookkeeping based on mappings from UCSC, NCBI, or Ensembl are probably in order.

ADD COMMENT
0
Entering edit mode

I am familiar with perl, and have just downloaded bioperl with various modules. I am looking at the GeneMapper documentation, however it is not clear to me how to "locate: position information of transcripts for conversion. Unfortunately there is no example code.. Any pointers or info regarding the commands for this module would be greatly appreciated!

I have started another question here: Bio::Coordinate::Genemapper -- Genemapper Question

ADD REPLY
1
Entering edit mode
14.0 years ago
Laura ★ 1.8k

If you have and NM identifier then biomart from ensembl or the ensembl api might be your best bet depending on how many you want to get

For relatively small numbers (<100) then mart filtering on the basis of Refseq DNA IDs is your best bet http://www.ensembl.org/biomart/martview

For larger numbers then using the ensembl api and their xref system might be better

http://www.ensembl.org/info/data/api.html
http://lists.ensembl.org/pipermail/dev/2010-October/000223.html

ADD COMMENT
0
Entering edit mode
14.0 years ago
Rm 8.3k

one of the way to retrieve from NCBI: Place a list of accession numbers in file called "nrlist": run it on the command line

for acn in `cat nrlist`; do curl --silent "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${acn}&rettype=gb"; done |  grep -P -A1 "ACCESSION|PRIMARY_SPAN|/chromosome" | grep -P "ACCESSION|/chromosome|[0-9]-[0-9]"

"nrlist"

NR_037438
NR_037439
NR_037440
NR_037441
NR_037442

OUTPUT

ACCESSION   NR_037438
               1-105               AL354831.18        16992-17096       c
                    /chromosome="13"
ACCESSION   NR_037439
               1-111               AC020606.7         56706-56816
                    /chromosome="7"
ACCESSION   NR_037440
               1-74                Z97192.2           5021-5094         c
                    /chromosome="22"
ACCESSION   NR_037441
               1-24                AL022477.1         1-24              c
                    /chromosome="6"

Its a messy long command line script but it works....

ADD COMMENT
0
Entering edit mode
14.0 years ago
Krisr ▴ 470

Thanks for the replies!

I have looked into the GeneMapper module of BioPerl(I have only a little experience with this, but want to learn more!. Could any one point me towards a tutorial or example code that may apply GeneMapper to a problem like this?

ADD COMMENT
0
Entering edit mode

No problem. Rather than posting a follow-up question in an answer like this, I suggest you post a new question to the forum, with a link back to this question if you think it still relevant.

ADD REPLY
0
Entering edit mode

Yes, if you have a new question then post a new question; to respond to answers, please comment under the answer or your original question.

ADD REPLY

Login before adding your answer.

Traffic: 2604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6