Entering edit mode
5.0 years ago
Kumar
▴
120
I have extracted a signature patterned short sequence from a genome. The short sequence either be a promoter or regulator of a particular gene. I know the coordinates of the short sequence within the genome, but how to identify or predict the short sequence corresponding gene? Is there any way to identify the gene of the corresponding short sequence?
Please see the picture illustration of the question?
As I understand this, you have the known position and/or sequence of a small genomic feature (promoter or whatever). You want to know the sequence of the gene adjacent to it?
MultiGeneBlast would work for this potentially. If you do a profile search for the promoter sequence, you can find the sequences in its environment. If you already have an example of the gene you're looking for, you can include that in the profile and that will keep the search constrained.
Thank you @Joe, I would try MultiGeneBlast aswell.
I am not sure I understand the question. If you know the genomic coordinates of the sequence and the genome is annotated then you can just look up the gene(s) whose coordinates overlap with those of your sequence. If the genome is not annotated, you could try annotating the region around your sequence or simply find homologous regions that are annotated and infer your gene of interest from there.
@Jean-Karim Heriche, Your suggestion is the possible solution of my question. However, I have to do the same for multiple short sequences. So, manual screening would be a tedious process. Therefore, could you please suggest me any possible way to automate the process?
The details depend on what's available to you. If the annotated genome is present in Ensembl, then you could write a script using the Ensembl perl API. If you have BED/GFF files, you could use bedtools closest or intersect.
@Jean-Karim Heriche, Yes, I have prokka annotated files (including GFF) of the concerned genome. I would try bedtools. Thank you.