Question

Identifying Chromosomal Position from Gene ID, Mutations, and their Locations

1

Entering edit mode

11.1 years ago

alger_fredericks ▴ 10

Hi,

I have a large list of mutations that contains the gene name, the EntrezID, the mutation(e.g. G>A), and the mutation location (e.g. c.230+1). I have even gone as far as to convert the Entrez ID to Ensembl Id, and RefSeq mRNA ID. Is there a way to convert any combination of this information in batches to identify the hg19 chromosomal position? I have tried several methods (Bioconductor, mutalyzer, etc.) and none seem to have a straightforward way to obtain the chromosomal position for both exonic and intronic mutations in batches.

Thanks,
AF

genome sequence • 9.9k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.1 years ago by alger_fredericks ▴ 10

2

Entering edit mode

If you don't have a transcript ID then you're pretty much screwed. It's often the case that a given gene will have multiple transcripts and since c.X coordinates are transcript-centric, it can be completely ambiguous which position is actually mutated. If all of the genes only have 1 transcript, then you could convert things.

ADD REPLY • link 11.1 years ago by Devon Ryan 105k

0

Entering edit mode

I don't have the specific transcript ID but I do have all RefSeq transcript ID's for each gene. I was thinking along the lines of using all transcript IDs for each gene, and then comparing the sequences to narrow it down. I know it wouldn't give me a definitive answer as to which transcript was used to name the mutation, but it would be a starting point. Given that I have a transcript ID to accompany the mutation location, is there anyway to convert that to chromosomal position?

ADD REPLY • link 11.1 years ago by alger_fredericks ▴ 10

0

Entering edit mode

That could work. There's surprisingly no obvious function to convert from transcript to genomic coordinates in R/Bioconductor (I found a discussion about writing one, but it looks like the person asking was able to just use a bioperl function). This wouldn't actually be difficult to write. The steps would be something like (this is for transcripts on the + strand):

Make/load a TranscriptDb (see GenomicFeatures)
Extract the transcript of interest from step 1
Get a running sum of the exon widths as a vector
Get the index of the first non-negative value of "position-step 3", which is the exon number (and the value of "position-step 3" is the offset into that exon.
Add the value from step 4 to the start of the index from step 4 and then you have your position.

As mentioned, that wouldn't work as-is for transcripts on the - strand, but gives you the idea. For the +1 or similar intronic coordinates, it'd just be a slight tweak to what I wrote. There, you'd find the end of the exon from step 4 and then add the offset into the intron to it.

ADD REPLY • link 11.1 years ago by Devon Ryan 105k

1

Entering edit mode

Is GenomicFeatures::transcriptLocs2RefLocs in the right direction?

ADD REPLY • link 11.1 years ago by Martin Morgan ★ 1.6k

1

Entering edit mode

What specifically are the coordinates 'c.230+1' relative to?

ADD REPLY • link 11.1 years ago by Martin Morgan ★ 1.6k

0

Entering edit mode

The nucleotide position at which the mutation occurs relative to the pre mRNA transcript.

ADD REPLY • link 11.1 years ago by alger_fredericks ▴ 10

0

Entering edit mode

Can you give the complete info you have for two or three of your mutations as an example, so I can try something out?

ADD REPLY • link 11.1 years ago by Bert Overduin ★ 3.7k

0

Entering edit mode

Mutalyzer's Position Converter should do the job. The batch option accepts a list of NM_ numbers in combination with the variants.

Example:

NM_058195.3:c.193+1G>A will result in NC_000009.11:g.21994137C>T

The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the NM_ by the corresponding RefSeq Gene NG_ in combination with the Gene symbol and the transcript variant number from the NG_ annotation:

NG_007485.1(CDKN2A_v001):c.193+1G>A

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by P.Taschner ▴ 10

Ram · Answer 1 · 2014-09-01

1

Entering edit mode

10.9 years ago

P.Taschner ▴ 10

Mutalyzer's Position Converter should do the job. The batch option accepts a list of NM_ numbers in combination with the variants.

Example:

NM_058195.3:c.193+1G>A will result in NC_000009.11:g.21994137C>T

The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the NM_ by the corresponding RefSeq Gene NG_ in combination with the Gene symbol and the transcript variant number from the NG_ annotation:

NG_007485.1(CDKN2A_v001):c.193+1G>A

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by P.Taschner ▴ 10

0

Entering edit mode

Seems like a convenient tool, thanks!

ADD REPLY • link 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

Ensembl's Variant Effect Predictor basically does the same.

ADD REPLY • link 10.9 years ago by Bert Overduin ★ 3.7k