Distance Between The Genes.
2
1
Entering edit mode
13.3 years ago
Ss ▴ 50

Hey!

I want to calculate the distance between the genes. I got the gene details from Ensemble Biomart. So, I have

Gene_Name Start(bp) End(bp)

I calculated the length of the gene just by subtracting start-end (bp). But I tried getting distance between the genes as following for

Gene1 S1 E1

Gene2 S2 E2

Gene3 S2 E2

where S=Start(bp)

E =End(bp)


Distance between Gene1 and Gene2 = S2-E1 and

Distance between Gene2 and Gene3 = S3-E2 and so on ....


Is it a incorrect way of finding the distance because the distance values I get are quite large to what has been reported.

Thanks.

gene distance • 4.5k views
ADD COMMENT
4
Entering edit mode

What you need to specific about is what your the definition of distance is. Unlike the distance between points the distance between intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc..

ADD REPLY
1
Entering edit mode

Are you just trying to find genomic separation in bp's between the genes? If so then yes, that looks like what you would do.

ADD REPLY
1
Entering edit mode

Beware overlapping genes. You might even find a small gene tucked inside of a large gene's intron.

ADD REPLY
1
Entering edit mode

Are these genes in the same chromosome? if not, the distance can be considered to be infinite.

ADD REPLY
1
Entering edit mode

Do you have an example of a gene distance that for you is much larger than the reported distance? And can you also refer to where you got this reference from?

ADD REPLY
0
Entering edit mode

What you need to specific about is what your the definition of distance. Unlike the distance between points the distance of intervals is not standardized. It could be the distance between 5' ends, or it could be the distance between midpoints, it could be the distance that is not covered by either gene, it could be the maximal distance that the genes and their interstitial space covers etc....

ADD REPLY
0
Entering edit mode

the length of a gene is not |start-end| but |start-end|+1

ADD REPLY
0
Entering edit mode

@Manu: Or may be |start-end|-1? If you want to count only intergenic bases.

ADD REPLY
1
Entering edit mode
12.9 years ago
ff.cc.cc ★ 1.3k
       if(S1 < S2)
              else if(E1 < S2)
                D=S2-E1
                Overlap=0
              else if(E1 < E2)
                D=0 // or S2-S1 if more interesting to your study
                Overlap=E1-S2
              else if(E1 > E2)
                D=0 // or S2-S1
                Overlap=E2-S2
        else 
          swap(gene1, gene2)
          goto beginning
ADD COMMENT
0
Entering edit mode
13.3 years ago
Rm 8.3k

yes, if both are on the same strand;

___S1----->E1____________S2------->E2_____

if gene1 is in + strand and gene2 in -ve strand then

___S1----->E1____________E2<-------S1_____

then distance between genes will be E2-E1

ADD COMMENT
3
Entering edit mode

No, this is not true. In Ensembl the start coordinate of a gene is by definition smaller than the end coordinate, irrespective of the strand. So, the way SS calculates the distances is correct.

ADD REPLY
2
Entering edit mode

Remember that distance is not equal to number of bases between the genes. If E1=1000 and S2=1999, there are S2-E1-1 or 998 bases of intergenic sequence here. If S1=1 and and E1=1000, the gene is 1000 (not 999) bp in length.

ADD REPLY
0
Entering edit mode

@bret; thanks for the info...

ADD REPLY

Login before adding your answer.

Traffic: 1931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6