Hello,
I am computing RPKMs and now I would calculate the value "L=length of the feature in Kb" (i.e. the length of the transcribed gene) on the basis of exon unique length count.
For example: from biomart I retrieved for each gene the length of the exons on the basis of the start and end position. Since there are different isoform for each gene I have the same exon more than one time (one for each isoform of transcript). I want to calculate the length of the transcribed gene on the basis of exon unique count.
This is an Example for one Gene:
FBgn0000008 37
FBgn0000008 251
FBgn0000008 41
FBgn0000008 789
FBgn0000008 212
FBgn0000008 1473
FBgn0000008 1207
FBgn0000008 170
FBgn0000008 117
FBgn0000008 337
FBgn0000008 251
FBgn0000008 41
FBgn0000008 789
FBgn0000008 212
FBgn0000008 1473
FBgn0000008 1207
FBgn0000008 170
FBgn0000008 117
FBgn0000008 217
FBgn0000008 344
FBgn0000008 41
FBgn0000008 789
FBgn0000008 212
FBgn0000008 1473
FBgn0000008 1207
FBgn0000008 170
FBgn0000008 117
FBgn0000008 344
FBgn0000008 818
I have to obtain this value:
37+251+ 41+789+212+1473+1207+170 +117 +337 +217+344 +818=5796
Any kind of Idea is appreciated!! Thanks
Are you looking for the cDNA length, the coding sequence length or the genomic length of the gene?
Hi,
thanks for you reply. I am searching a way to not overcount the length of overlapping exons. Since for each gene there are multiple transcript and for all the transcript there are exons that overlap one with each other, I am searching a way to:
This is my problem, I am trying with reduce() function on the IRanged bioconductor the package but I still have some problems..
I put the problem in Devon reply. Just to better understand.
Cheers,
Tommi