Question

How to identify the corresponding gene of a short sequence of a genome?

0

Entering edit mode

5.0 years ago

Kumar ▴ 120

I have extracted a signature patterned short sequence from a genome. The short sequence either be a promoter or regulator of a particular gene. I know the coordinates of the short sequence within the genome, but how to identify or predict the short sequence corresponding gene? Is there any way to identify the gene of the corresponding short sequence?

Please see the picture illustration of the question?

picture illustration

fasta gene genome sequence • 1.0k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 5.0 years ago by Kumar ▴ 120

1

Entering edit mode

As I understand this, you have the known position and/or sequence of a small genomic feature (promoter or whatever). You want to know the sequence of the gene adjacent to it?

MultiGeneBlast would work for this potentially. If you do a profile search for the promoter sequence, you can find the sequences in its environment. If you already have an example of the gene you're looking for, you can include that in the profile and that will keep the search constrained.

ADD REPLY • link 5.0 years ago by Joe 21k

0

Entering edit mode

Thank you @Joe, I would try MultiGeneBlast aswell.

ADD REPLY • link 5.0 years ago by Kumar ▴ 120

0

Entering edit mode

I am not sure I understand the question. If you know the genomic coordinates of the sequence and the genome is annotated then you can just look up the gene(s) whose coordinates overlap with those of your sequence. If the genome is not annotated, you could try annotating the region around your sequence or simply find homologous regions that are annotated and infer your gene of interest from there.

ADD REPLY • link 5.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

@Jean-Karim Heriche, Your suggestion is the possible solution of my question. However, I have to do the same for multiple short sequences. So, manual screening would be a tedious process. Therefore, could you please suggest me any possible way to automate the process?

ADD REPLY • link 5.0 years ago by Kumar ▴ 120

1

Entering edit mode

The details depend on what's available to you. If the annotated genome is present in Ensembl, then you could write a script using the Ensembl perl API. If you have BED/GFF files, you could use bedtools closest or intersect.

ADD REPLY • link 5.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

@Jean-Karim Heriche, Yes, I have prokka annotated files (including GFF) of the concerned genome. I would try bedtools. Thank you.

ADD REPLY • link 5.0 years ago by Kumar ▴ 120