Entering edit mode
5.7 years ago
fhsantanna
▴
620
I want to evaluate the neighborhood of a particular gene among different bacterial lineages. The main objective is to develop a gene context metric that measures if the physical association of a neighboring to a gene of interest has a biological meaning or not (or just evaluating if the probability of a given gene to be found next to the gene of interest).
Do you know a software that is able to do it?
If not, I was wondering that I could do the following algorithm, based on (https://www.pnas.org/content/115/23/E5307.short):
- Identify the gene of interest among the genomes;
- Cut a "gene island" containing the gene of interest and the neighboring sequences (10 kb upstream and downstream);
- Group the neighboring genes based on similarity (bidirectional blast-hits or cd-hit);
- Creatte a "mock" genome without the gene island;
- Blast the consensus of the homolog groups (or a representative sequence) to the each database, "gene islands" and "mock genomes";
- Count the hits in each database and calculate the metric;
- Neighboring_metric = (hits_gene_islands_database - hits_mock_genomes) / (hits_gene_islands_database + hits_mock_genomes)
Does it sound reasonable to you?
2 tools that come to mind for synteny considerations are
Sibelia
(more recentlysibeliaX
) : https://link.springer.com/chapter/10.1007/978-3-642-40453-5_17And also MultiGeneBlast: http://multigeneblast.sourceforge.net/
I think the latter is kind of doing what you’re suggesting. It considers the cumulative blast scores of multiple hits, and I think includes weightings for their proximity etc.
Thank you, I will take a look.