Hi,
I have a large list of mutations that contains the gene name, the EntrezID, the mutation(e.g. G>A), and the mutation location (e.g. c.230+1). I have even gone as far as to convert the Entrez ID to Ensembl Id, and RefSeq mRNA ID. Is there a way to convert any combination of this information in batches to identify the hg19 chromosomal position? I have tried several methods (Bioconductor, mutalyzer, etc.) and none seem to have a straightforward way to obtain the chromosomal position for both exonic and intronic mutations in batches.
Thanks,
AF
If you don't have a transcript ID then you're pretty much screwed. It's often the case that a given gene will have multiple transcripts and since c.X coordinates are transcript-centric, it can be completely ambiguous which position is actually mutated. If all of the genes only have 1 transcript, then you could convert things.
I don't have the specific transcript ID but I do have all RefSeq transcript ID's for each gene. I was thinking along the lines of using all transcript IDs for each gene, and then comparing the sequences to narrow it down. I know it wouldn't give me a definitive answer as to which transcript was used to name the mutation, but it would be a starting point. Given that I have a transcript ID to accompany the mutation location, is there anyway to convert that to chromosomal position?
That could work. There's surprisingly no obvious function to convert from transcript to genomic coordinates in R/Bioconductor (I found a discussion about writing one, but it looks like the person asking was able to just use a bioperl function). This wouldn't actually be difficult to write. The steps would be something like (this is for transcripts on the + strand):
As mentioned, that wouldn't work as-is for transcripts on the - strand, but gives you the idea. For the +1 or similar intronic coordinates, it'd just be a slight tweak to what I wrote. There, you'd find the end of the exon from step 4 and then add the offset into the intron to it.
Is
GenomicFeatures::transcriptLocs2RefLocs
in the right direction?What specifically are the coordinates 'c.230+1' relative to?
The nucleotide position at which the mutation occurs relative to the pre mRNA transcript.
Can you give the complete info you have for two or three of your mutations as an example, so I can try something out?
Mutalyzer's Position Converter should do the job. The batch option accepts a list of
NM_
numbers in combination with the variants.Example:
NM_058195.3:c.193+1G>A
will result inNC_000009.11:g.21994137C>T
The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the
NM_
by the corresponding RefSeq GeneNG_
in combination with the Gene symbol and the transcript variant number from theNG_
annotation: