I have a list of mutation hotspots (such as Gene name : NOTCH1 and change is Amino Acid: L1574). I have the data in the above mentioned format for different genes. Is there any way to get the genomic Coordinates and the exon they belong to at once? Right now I am searching one by one and it is taking quite a lot of time. Any help would be really helpful.
you did not link between the two so no one knows that you're asking two sets of online volunteers to spend their time on your problem without telling them that you're also asking the other group.
Interesting. I had similar tasks before, and this is actually pretty hard to do perfectly in bioinformatics. The issue is that those hotspot denotations are from old days where each gene only has one transcript; nowadays, most genes have many transcript, and sometimes even the original "one" transcript could change.
you need to determine which reference you are using, hg19 or hg38. This is the easy part.
you need to determine the gene model you want to use. This is easy but has caveat: the particular protein change might not exist in your gene model in some edge cases.
in the same gene model, each gene could have many transcript, and you need to figure out the main transcript. This is difficult.
If you don't have too many of such hotspot, the best bet is to look for if there are someone already compiled a major hotspot <-> coordinate mapping table so you can skip step 2 and 3. For missing ones, you have to figure out one by one.
If you want to do step 2 and 3, the recent MANE transcript annotation could help a lot; but this won't solve all the edge cases.
see Amino Acid Change To Genomic Location ; Amino Acid Change To Genomic Location: using 'backlocate' ; rs number of an article PMID: 15816807
cross posted: https://stackoverflow.com/questions/78280030/