Identifying Chromosomal Position from Gene ID, Mutations, and their Locations
1
1
Entering edit mode
10.5 years ago

Hi,

I have a large list of mutations that contains the gene name, the EntrezID, the mutation(e.g. G>A), and the mutation location (e.g. c.230+1). I have even gone as far as to convert the Entrez ID to Ensembl Id, and RefSeq mRNA ID. Is there a way to convert any combination of this information in batches to identify the hg19 chromosomal position? I have tried several methods (Bioconductor, mutalyzer, etc.) and none seem to have a straightforward way to obtain the chromosomal position for both exonic and intronic mutations in batches.

Thanks,
AF

genome sequence • 9.2k views
ADD COMMENT
2
Entering edit mode

If you don't have a transcript ID then you're pretty much screwed. It's often the case that a given gene will have multiple transcripts and since c.X coordinates are transcript-centric, it can be completely ambiguous which position is actually mutated. If all of the genes only have 1 transcript, then you could convert things.

ADD REPLY
0
Entering edit mode

I don't have the specific transcript ID but I do have all RefSeq transcript ID's for each gene. I was thinking along the lines of using all transcript IDs for each gene, and then comparing the sequences to narrow it down. I know it wouldn't give me a definitive answer as to which transcript was used to name the mutation, but it would be a starting point. Given that I have a transcript ID to accompany the mutation location, is there anyway to convert that to chromosomal position?

ADD REPLY
0
Entering edit mode

That could work. There's surprisingly no obvious function to convert from transcript to genomic coordinates in R/Bioconductor (I found a discussion about writing one, but it looks like the person asking was able to just use a bioperl function). This wouldn't actually be difficult to write. The steps would be something like (this is for transcripts on the + strand):

  1. Make/load a TranscriptDb (see GenomicFeatures)
  2. Extract the transcript of interest from step 1
  3. Get a running sum of the exon widths as a vector
  4. Get the index of the first non-negative value of "position-step 3", which is the exon number (and the value of "position-step 3" is the offset into that exon.
  5. Add the value from step 4 to the start of the index from step 4 and then you have your position.

As mentioned, that wouldn't work as-is for transcripts on the - strand, but gives you the idea. For the +1 or similar intronic coordinates, it'd just be a slight tweak to what I wrote. There, you'd find the end of the exon from step 4 and then add the offset into the intron to it.

ADD REPLY
1
Entering edit mode

Is GenomicFeatures::transcriptLocs2RefLocs in the right direction?

ADD REPLY
1
Entering edit mode

What specifically are the coordinates 'c.230+1' relative to?

ADD REPLY
0
Entering edit mode

The nucleotide position at which the mutation occurs relative to the pre mRNA transcript.

ADD REPLY
0
Entering edit mode

Can you give the complete info you have for two or three of your mutations as an example, so I can try something out?

ADD REPLY
0
Entering edit mode

Mutalyzer's Position Converter should do the job. The batch option accepts a list of NM_ numbers in combination with the variants.

Example:

NM_058195.3:c.193+1G>A will result in NC_000009.11:g.21994137C>T

The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the NM_ by the corresponding RefSeq Gene NG_ in combination with the Gene symbol and the transcript variant number from the NG_ annotation:

NG_007485.1(CDKN2A_v001):c.193+1G>A
ADD REPLY
1
Entering edit mode
10.2 years ago
P.Taschner ▴ 10

Mutalyzer's Position Converter should do the job. The batch option accepts a list of NM_ numbers in combination with the variants.

Example:

NM_058195.3:c.193+1G>A will result in NC_000009.11:g.21994137C>T

The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the NM_ by the corresponding RefSeq Gene NG_ in combination with the Gene symbol and the transcript variant number from the NG_ annotation:

NG_007485.1(CDKN2A_v001):c.193+1G>A
ADD COMMENT
0
Entering edit mode

Seems like a convenient tool, thanks!

ADD REPLY
0
Entering edit mode

Ensembl's Variant Effect Predictor basically does the same.

ADD REPLY

Login before adding your answer.

Traffic: 1735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6