Question

Ajdust for GC content bias in RNA-seq DE analysis.

2

Entering edit mode

7.8 years ago

statfa ▴ 790

When I was reading this paper:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-480

I realized that even though it was believed that "for a given gene, the GC-content effect was the same across samples and hence would cancel out when considering DE statistics such as count ratios.", but now, this belief is disputed and they say, biases due to GC content should be normalized before DE analysis.

Now, I have a table of raw read counts. I analyzed the data without controlling for the GC content. I know that such effect can be absorbed into sample specific sequencing depth if only a single sample is sequenced in each lane. My data comes from an experiment in which two samples have been sequenced in each lane. How can I normalize the data if all I have is the table of raw read counts? Is it ok if I don't adjust the effect of GC content and normalize my data only for sequencing depth bias?

gc content normalization • 3.9k views

ADD COMMENT • link updated 7.8 years ago by Devon Ryan 105k • written 7.8 years ago by statfa ▴ 790

2

Entering edit mode

DESeq2 accounts for this. I assume other packages may as well.

ADD REPLY • link 7.8 years ago by GenoMax 150k

2

Entering edit mode

I always assumed they don't because you're only comparing genes against each other, with the same GC content... I don't think those tools take GC content into account by default. They're also agnostic about those features, they only have counts as input...

ADD REPLY • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Yeah, I thought the same as you did but when I read that paper, I realized that it's essential to adjust for the GC content bias. Could anyone show me some papers where they suggest it's not essential to account for GC content bias please?

ADD REPLY • link 7.8 years ago by statfa ▴ 790

1

Entering edit mode

There haven't been GC-bias issues for the last ~5 years. You're not going to find a paper about that, no one would bother writing it.

ADD REPLY • link 7.8 years ago by Devon Ryan 105k

1

Entering edit mode

Thank you. Can that package control the GC content bias when you only have the table of read counts?

ADD REPLY • link 7.8 years ago by statfa ▴ 790

score 3 · Answer 1 · 2017-06-23

3

Entering edit mode

7.8 years ago

Devon Ryan 105k

It is rarely necessary to account for GC bias, since it's rare these days for there to be a GC bias between samples. If you're worried about that, you can use the CQN package (from bioconductor) with DESeq2.

ADD COMMENT • link 7.8 years ago by Devon Ryan 105k

0

Entering edit mode

Yeah, I remember you once told me that. But when I read these two papers, it seems that it's essential to use within lane normalizations for GC content.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-480

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917940/

The problem is that now that I have finished my analyses and am about to present my results, I've realized that GC content bias should normalized. I don't have enough time to normalize the data for GC content if it's possible to normalize them, and repeat the analysis. That's why I asked this question here to ask for some references to mention as a reason to why I haven't accounted for the GC content bias.

ADD REPLY • link 7.8 years ago by statfa ▴ 790

4

Entering edit mode

You haven't realized that GC content should be normalized, you simply think it needs to be. If you can't show a GC bias between samples then you have no indication that it should be normalized (there'd be nothing to normalize).

ADD REPLY • link 7.8 years ago by Devon Ryan 105k

0

Entering edit mode

What's the threshold for considering a GC bias? If I have a control group of samples at 44% (+/- 2) GC compared to my treated group that is 41% (+/- 3), is that bias or within normal? Having a hard time finding an answer.

ADD REPLY • link 7.4 years ago by annen ▴ 30

0

Entering edit mode

A single number doesn't exist. As I mentioned in your original thread, use the cqn package to make a diagnostic plot. If the distributions in that are quite different then you need to correct for it. Otherwise, you might end up correcting out a difference in expression of only a couple transcripts.

ADD REPLY • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

Okay, I finally managed to do that and it looks like I do in fact need to correct for GC bias?

https://ibb.co/gkBxyG https://ibb.co/g0RBsb

ADD REPLY • link 7.4 years ago by annen ▴ 30

0

Entering edit mode

Yeah, it looks like it'll benefit you.

ADD REPLY • link 7.4 years ago by Devon Ryan 105k