Question

Gene context metrics: software/algorithm

0

Entering edit mode

5.7 years ago

fhsantanna ▴ 620

I want to evaluate the neighborhood of a particular gene among different bacterial lineages. The main objective is to develop a gene context metric that measures if the physical association of a neighboring to a gene of interest has a biological meaning or not (or just evaluating if the probability of a given gene to be found next to the gene of interest).

Do you know a software that is able to do it?

If not, I was wondering that I could do the following algorithm, based on (https://www.pnas.org/content/115/23/E5307.short):

Identify the gene of interest among the genomes;
Cut a "gene island" containing the gene of interest and the neighboring sequences (10 kb upstream and downstream);
Group the neighboring genes based on similarity (bidirectional blast-hits or cd-hit);
Creatte a "mock" genome without the gene island;
Blast the consensus of the homolog groups (or a representative sequence) to the each database, "gene islands" and "mock genomes";
Count the hits in each database and calculate the metric;
Neighboring_metric = (hits_gene_islands_database - hits_mock_genomes) / (hits_gene_islands_database + hits_mock_genomes)

Does it sound reasonable to you?

gene context neighborhood • 913 views

ADD COMMENT • link 5.7 years ago by fhsantanna ▴ 620

1

Entering edit mode

2 tools that come to mind for synteny considerations are Sibelia (more recently sibeliaX) : https://link.springer.com/chapter/10.1007/978-3-642-40453-5_17

And also MultiGeneBlast: http://multigeneblast.sourceforge.net/

I think the latter is kind of doing what you’re suggesting. It considers the cumulative blast scores of multiple hits, and I think includes weightings for their proximity etc.