The tendency for longer genes (and their GO categories etc) to have a greater likelihood to be classified as differentially expressed in RNA-Seq is well known but I'm wondering what people commonly do to account for it? I am aware of a few methods out there (GOSeq, NOISeq etc) but I get the feeling that the most common approach is still to ignore it. Am I wrong in thinking this? One of the most commonly used RNA-Seq toolkits is still the 'Tuxedo Suite' and unless I am mistaken it does not account for length bias in any way.
I should point out that RPKM/FPKM as normalisation methods do not address this problem. Differential expression based on these values is more likely for longer transcripts/genes.
All comments welcome.
so if it isnt accounted for, shouldnt it be? and what would be the ideally perfect method. Im not clear about why correcting for transcript length wouldnt fix this