When I was reading this paper:
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-480
I realized that even though it was believed that "for a given gene, the GC-content effect was the same across samples and hence would cancel out when considering DE statistics such as count ratios.", but now, this belief is disputed and they say, biases due to GC content should be normalized before DE analysis.
Now, I have a table of raw read counts. I analyzed the data without controlling for the GC content. I know that such effect can be absorbed into sample specific sequencing depth if only a single sample is sequenced in each lane. My data comes from an experiment in which two samples have been sequenced in each lane. How can I normalize the data if all I have is the table of raw read counts? Is it ok if I don't adjust the effect of GC content and normalize my data only for sequencing depth bias?
DESeq2 accounts for this. I assume other packages may as well.
I always assumed they don't because you're only comparing genes against each other, with the same GC content... I don't think those tools take GC content into account by default. They're also agnostic about those features, they only have counts as input...
Yeah, I thought the same as you did but when I read that paper, I realized that it's essential to adjust for the GC content bias. Could anyone show me some papers where they suggest it's not essential to account for GC content bias please?
There haven't been GC-bias issues for the last ~5 years. You're not going to find a paper about that, no one would bother writing it.
Thank you. Can that package control the GC content bias when you only have the table of read counts?