Question

Transcript Length

3

Entering edit mode

14.1 years ago

alittleboy ▴ 220

Hi All:

I have a question about transcript length: can I know the reasonable "range" in base pair for transcript length? When I use the getlength() function in the goseq bioconductor package, which uses UCSC genome browser for each combination of genome and id, I found the range to be from ~300 bp to ~80,000 bp. Is that long transcript reasonable?

Sorry I don't know much about related biology, but from a statistical perspective, I may consider that it is an outlier...Is this true? Thank you very much!

transcript length • 7.6k views

ADD COMMENT • link updated 14.1 years ago by Larry_Parnell 16k • written 14.1 years ago by alittleboy ▴ 220

score 5 · Answer 1 · 2011-06-25

5

Entering edit mode

14.1 years ago

Spitshine ▴ 660

Depends what you call outlier. The human genome codes for Titin, a protein with > 30,000 aminoacids, hence the mRNA should be >90,000 bp. So, 80,000 bp is in fact a little short but the number of protein-coding transcripts that long is small.

ADD COMMENT • link 14.1 years ago by Spitshine ▴ 660

score 0 · Answer 2 · 2011-06-27

You should not be overly concerned with outliers when using RefSeq, Ensembl or Havana standards to define gene and transcript coordinates and hence their lengths. Titin is a great example (+1) and isoform NM_133378 is 101520 bp long. Note the RefSeq accession. Keep in mind that non-protein coding genes may have a very different distribution of length - microRNAs are quite short and lincRNAs can be long. Transcribed pseudogenes would generally be shorter than the functional version of that gene.