Question

when do we need to normalize for GC-content and/or length bias in RNA-Seq reads?

0

Entering edit mode

24 months ago

pilargmarch ▴ 110

Hi! This has been a conundrum for me these past months. There are some packages like cqn (conditional quantile normalization) and EDASeq that can be used to normalize for sample-specific gene GC content and/or length biases, which can alter functional enrichment analysis results.

My question is, when is it appropiate to use these normalization techniques? I have some GSEA results that change drastically after normalizing with cqn, going from 17 to 109 significant GO terms, but I'm not really sure if it's correct to do this.

Thanks for reading :)

gc-content normalization RNA-seq bias edaseq • 718 views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 24 months ago by pilargmarch ▴ 110

score 1 · Answer 1 · 2022-11-28

To be honest, I don't think anyone knows what is right and what is wrong - it is a bit of a wild west out there, everyone swinging.

I would plot the distribution of the p-values, and generate heatmaps, and PCA plots to try to understand whether the process improved the data or introduced unwanted artifacts.

Try to explain the changes from the point of view of the changes you get in genes and error distribution you get, and not in terms of the GO terms' enrichment. ( will admit that I am not sure if these corrections are applied before the DE detection runs or after).