Average length of mRNA on NCBI annotation seems too high
1
0
Entering edit mode
11 weeks ago
yh1126 • 0

I noticed that average length of all mRNA transcripts from human (GRCh38) annotation was 90043.2 bp, which seems unusually high. The average length of mRNA should be 3-4,000 bp as I am aware.

Is there an explanation for this?

I calculated the average length from all entry records on GRCh38 RefSeq annotation for which the column 2 (feature type) information was “mRNA”. Below is the command that I used.

awk -v FS="\t" '$1!~"^#" && $3=="mRNA"{sum+=$5-$4;n++}END{print sum/n}' GCF_000001405.40_GRCh38.p14_genomic.gff

The “awk” that I’m using is mawk 1.3.4.

mRNA • 361 views
ADD COMMENT
2
Entering edit mode
11 weeks ago

Seems like you (wrongly) calculated the average mRNA length on the genome (== including introns ) and thus not only the summed length of all the exons on a gene (== biological mRNA, == what you want or assumed you calculated)

You should sum&average the accumulated length of all the exon entries per mRNA in your file

ADD COMMENT

Login before adding your answer.

Traffic: 7681 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6