Hey Biostars,
I have related but distinct problems. I'd like to ask if there is an easy way to solve either one.
Problem 1: My collaborators provided me a file with Gene Name, cDNA, and protein change but no unique identifiers (!!), as follows:
gene alteration cdna
AKT1 E17K c.49G>A
AKT1 G162S c.484G>A
AKT1 G162D c.485G>A
AKT1 R174C c.520C>T
AKT1 R174C c.520C>T
AKT1 A188G c.563C>G
ALK H976H c.2928C>T
ALK N986N c.2958C>T
ALK E994K c.2980G>A
ALK P999L c.2996C>T
ALK C1008Y c.3023G>A
ALK H1063P c.3188A>C
ALK E1065K c.3193G>A
ALK L1080R c.3239T>G
ALK T1090I c.3269C>T
ALK G1184W c.3550G>T
ALK V1185V c.3555G>T
ALK E1197K c.3589G>A
First, I know this is technically not soluble without using what amounts to inference. Now then, is there any non-awful way to map these to correctly map these variants? For instance, I suppose I could pull the MANE transcript isoform, and see if that corresponds to the alteration in cDNA. Supposing that it does correspond perfectly in every case, I think I could surmise that I have the correct transcript isoform ... I can definitely do this, but I wanted to see if anyone has anyone faced this problem before? What did you do? There are many genes - cannot be done manually.
Problem 2: Suppose I am ultimately able to figure out the problem above (by "figure out" I suppose I mean accurately annotate every variant with an unambiguous label of some kind (rsID, or a chr:pos pair and a build), or what have you. The next thing I would like to do is determine how far each variant is from a given annotation. Specifically, I want to know how far each variant is from the nearest Intron-Exon junction; or the nearest known splice site.
Open to any tools ...
Thank you
Problem 1: You have to go back to your collaborators and ask for either the unique ids or the raw data. This is an unattainable situation.
Problem 2:
Bedtools
is probably the answer here.Im writing a function to map the cDNAs to every known transcript isoform, then determine a fit score for each. if the results make no sense (they will, though), ill go back to them.
For 2, thanks v. much will check dox
VAL