Hello everyone,
I have the positions of multiple genes in a reference genome (G1), and I need to find the positions (start,end,strand) of those genes in another genome (G2). I'm not sure how to do it!
Thank you,
Hello everyone,
I have the positions of multiple genes in a reference genome (G1), and I need to find the positions (start,end,strand) of those genes in another genome (G2). I'm not sure how to do it!
Thank you,
As far as I know, every gene has a stable id which is assigned uniquely to it (for human genes you even have HNGC), so you can use this id to find out positions and other genomes that contain this gene. The biggest databases would be EnsEMBL www.ensembl.org) and NCBI (https://www.ncbi.nlm.nih.gov). I believe both allow you to search by gene id/name, sequence or even description/keywords. Each gene has its own "gene page" where you can see other genomes that contain it.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your response. The problem is that in the data that I have some genes have names but some only have lucas tag which is not consistent among different genomes. I was looking for a app/script that can extract the sequence of the gene from the genome G1 and find the exact sequence in genome G2 and provide start and end. Is it a good approach? I might be able to write such an script myself but I don't know if the exact sequence exists in other genomes or if I need to consider indels? Any help would be appreciated.
I suggest trying to look at the Perl API of EnsEMBL (https://m.ensembl.org/info/docs/api/core/index.html#api). There should definitely be some way to use it to search other genomes for your desired seq from G1 and get the positions in G2. I don't know about indels as I haven't used it myself.