Hi,
Can anyone tell me how the functional distance between two gene-sets can be measured? Things that immediately come to mind are GO Term similarities, network distance etc. Any review article is very welcome.
Thank you
Hi,
Can anyone tell me how the functional distance between two gene-sets can be measured? Things that immediately come to mind are GO Term similarities, network distance etc. Any review article is very welcome.
Thank you
If you have two genesets in bed format then you could just run a tool from bedtools
closestBed -a geneset1.bed -b geneset2.bed -d > output.bed
-d
would tell you distance between them
You can do it with bedops also
closest-features --closest geneset1.bed geneset2.bed > output.bed
hth
No. closestBed will tell you the closest features/genes from set B in set A. However I am not sure what you mean by 'distance' between two gene sets. If there are 100 genes in A and 50 genes in B, do you want to find distance between each possible pair in A & B, such that your resulting file will have 100*50 entries? You should add more details to your question and try to be clear (give an example maybe?).
I don't know about about bedtools and don't know either in which kind of biological problems one can use distance between genomic co-ordinates. Just curious if two genes are in different chromosomes what output of Bedtool provides?
What I have in mind is something similar what you use while calculating distance between clusters in different clustering algorithms. As genes belong to networks average shortest distance between members of the two gene sets can be taken as distance measure. But I guess better methods are available. The simplest distance measure would be calculating the member overlap between the two sets of gene.
BEDOPS and BedTools applications work with files in BED format.
BEDOPS works with BED data in sorted form, in order to get additional speed and memory benefits. Your genes could be, at least, four columns: chromosome, start and stop positions, and a gene name.
A BEDOPS tool like closest-features
will report, for each gene, the nearest upstream or downstream gene (and features). You can add a --dist
option to report the numerical distance between the nearest edges of the two features.
The default is to report both the nearest upstream and downstream elements. Using --closest
picks the nearer of the two, with one picked at random in case of ties. The --dist
option tells you the numerical distance between target feature and the upstream and/or downstream feature, depending on additional options. Check out --help
or the online docs for more info.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Could you maybe elaborate on how distance is defined in this context please? Is it the sum of differences between the gene sequences, or a measure of differences in their ontology, or something else?
You can also ask him to define "gene" :-)
Can you elaborate your comment a bit? Say, If a 'gene' is defined the same way as they are defined in a gene set, how that provides an answer to my question?
I assumed OP was referring to the gene as a structural or functional unit. Structural distance can be quantified, functional "distance", not so much. Come to think of it, could we follow some kind of algorithm that Amazon or Netflix use to tailor recommendation based on previous choices to find functionally homologous genes to a gene under study?
Something akin to "measure of differences in their ontology", in essence I'm trying to capture the differences in their biological functions.
Oh, OK. The problem is, there is no concrete measure of functional similarity between genes. Converting ontology information to quantitative difference information would be arbitrary, at best. For example, in the simplest terms, each gene might have descriptive terms in 3 GO categories. Would you then count differences between the two sets? How would one determine if term A is similar or different from term B?
Is there any precedent for such work? Any place where distances between genes (let alone gene sets) is mentioned in quantitative terms? This information would help a lot.
There is already a lot of work on GOterms semantic similarity measures, I'm looking for some other methods like distances between the networks they constitute etc.
Oh. Gene networks is beyond me. I guess someone more conversant will help you out. In the meantime, might I suggest updating your question to have some detail on how you are looking to gauge similarity and what you expect out of this exercise? Where open discussion might be involved, descriptive questions are always better than one liners.
I don't think you can physically/quantitatively capture the differences. But can't you just throw the lists in DAVID or a gene set enrichment program and see how the function differs?
This question seems to be quite open ended. Should be maybe make this a forum discussion?
Devon Ryan
Istvan Albert
Pierre Lindenbaum