Hey!
I want to calculate the distance between the genes. I got the gene details from Ensemble Biomart. So, I have
Gene_Name Start(bp) End(bp)
I calculated the length of the gene just by subtracting start-end (bp). But I tried getting distance between the genes as following for
Gene1 S1 E1
Gene2 S2 E2
Gene3 S2 E2
where S=Start(bp)
E =End(bp)
Distance between Gene1 and Gene2 = S2-E1 and
Distance between Gene2 and Gene3 = S3-E2 and so on ....
Is it a incorrect way of finding the distance because the distance values I get are quite large to what has been reported.
Thanks.
What you need to specific about is what your the definition of distance is. Unlike the distance between points the distance between intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc..
Are you just trying to find genomic separation in bp's between the genes? If so then yes, that looks like what you would do.
Beware overlapping genes. You might even find a small gene tucked inside of a large gene's intron.
Are these genes in the same chromosome? if not, the distance can be considered to be infinite.
Do you have an example of a gene distance that for you is much larger than the reported distance? And can you also refer to where you got this reference from?
What you need to specific about is what your the definition of distance. Unlike the distance between points the distance of intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc....
the length of a gene is not |start-end| but |start-end|+1
@Manu: Or may be |start-end|-1? If you want to count only intergenic bases.