Entering edit mode
4.1 years ago
imaparna27
▴
20
I am trying to add ensembl ID to the genes and information I collected through publications given for some interaction, two issues I am facing are-
- Gene isoform information is not available in publication, but ensembl ID is according to isoform for some genes (example-miR-7)
- Secondly, isoform given in publication and ensembl Id not containing any details regarding isoform (example-PI3K)
How shall I include these two kind of data in my work from interaction persepctive?
In Ensembl a gene corresponds to a chromosome locus so the best way to disambiguate gene names is to map associated sequence information to the Ensembl genome. Sequence information may not be provided in the paper you're curating but could be available in one of the references (for example when authors reuse a reagent described in a previous paper). If there's no sequence information then all you can do is try to match names using Ensembl Xrefs. Sometimes you'll also need context with the name. For example, PI3K is a family of phosphoinositide kinases but the paper could make it clear which one is relevant. When trying to match names, I would suggest to use gene symbols as the unifying identifiers rather than Ensembl IDs as this is closer to most biologists' notion of gene and so may be easier to translate from papers without sequence information.
Thanks a lot for your response. Please, help me with one more query regarding your response-
*In Ensembl a gene corresponds to a chromosome locus so the best way to disambiguate gene names is to map associated sequence information to the Ensembl genome*
You could do the mapping to the transcriptome, that would ignore anything not predicted to be expressed and reduce the number of candidates. Beyond this, without the whole sequence of a precursor form of the miRNA, you may still end up with more than one possible gene of origin. But mapping a small oligonucleotide is better than nothing.