Question

DESeq2 normalisation: is the size of the gene taken into account?

4

Entering edit mode

9.6 years ago

Aurelie MLB ▴ 360

Hello,

I do not manage to really understand if the DESeq2 normalisation and regularized log transformation are taking the size of the gene into account. Do they?

It seems to me that they are not...But I am probably missing something. Do I have a bias toward long genes when I am using DESeq to find differentially expressed genes or when I am looking at expression profiles after a regularized log transformation ?

Many thanks

RNA-Seq • 13k views

ADD COMMENT • link updated 2.0 years ago by Ram 44k • written 9.6 years ago by Aurelie MLB ▴ 360

Ram · Answer 1 · 2015-04-29

8

Entering edit mode

9.6 years ago

Devon Ryan 104k

No the normalization steps don't take gene size into account, since it doesn't matter. You do not have a bias toward longer genes, rather you have increased power to find changes in them given a constant expression level. This is a good thing, you do not want to try to get rid of it.

If you're doing something like GO enrichment or other downstream analyses where gene length can play a role, then you should account for it there (see, for example, the GOseq package).

ADD COMMENT • link 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you !

ADD REPLY • link 9.6 years ago by Aurelie MLB ▴ 360

0

Entering edit mode

I don't understand why in goseq they calculate the median not the sum of transcripts at section 5.3 ! Do you have any comments on that?

ADD REPLY • link 9.6 years ago by Parham ★ 1.6k

0

Entering edit mode

You should probably post this as a separate question.

ADD REPLY • link 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

Right! Do you see it as an issue so I make a separate post about it?

ADD REPLY • link 9.6 years ago by Parham ★ 1.6k

1

Entering edit mode

Well, its a legitimate question and unrelated to the current thread, so yes.

ADD REPLY • link 9.6 years ago by Devon Ryan 104k

0

Entering edit mode

@Devon Ryan I didn't understand why you will increased power to find changes in them given a constant expression level? Do you mean that you want to look for higher count values in longer genes across samples? Thanks!

ADD REPLY • link updated 2.0 years ago by Ram 44k • written 9.4 years ago by ATRX ★ 1.1k

0

Entering edit mode

Longer genes have higher counts, so their relative expression levels across conditions is easier to measure.

ADD REPLY • link 9.4 years ago by Devon Ryan 104k

Ram · Answer 2 · 2015-04-29

7

Entering edit mode

9.6 years ago

Damian Kao 16k

If you are comparing the same gene among different samples, then it doesn't really matter since you will be normalizing the gene in the different samples by the same length.

If you want to compare different genes within the same sample, then gene length would matter (DESeq2 wasn't really made to do this anyways). However, I don't think trying to compare different genes within a sample sample is valid, depending on how you arrived at your tag counts.

For example, if you only considered uniquely mapped reads in generating your tag counts, then for genes with repetitive/conserved regions, you will be artificially under-tag-counting that gene.

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by Damian Kao 16k

0

Entering edit mode

Hello,

OK thank you I realise now why the size is not important in comparisons between samples. And I can see why it is a problem to compare gene expression with a sample...

May I ask you another question then please? What I actually would like to do is to inspect the expression of all genes within a sample to see how much markers are expressed in a control sample for instance. So far, I have been using the regularised log transformation of DESeq2 on the counts and plotted the log value (y axis) versus the genes (x axis). I get from your answer that it might be misleading to do this... But would there be a better way? Would a classical log2 transformation on FPKM be better as it would at least account for the size? (and yes I did considered the uniquely mapped reads only...:( )

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by Aurelie MLB ▴ 360

0

Entering edit mode

Are you trying to assess how abundantly a gene is expressed for experimental purposes (insitu hybs, transgenic targets)? I get that question a lot from my lab mates.

It is not an exact science since the signal you will get from whatever marker you are using will depend on many different factors, of which, the abundance of expression might not play that big of a role.

What I usually end up doing for my lab mates is just rank their candidate genes by tag counts per kb and they can choose the top 10 genes or something. I don't have enough data to say whether there is some kind of correlation between tag counts per kb and marker signal.

ADD REPLY • link 9.6 years ago by Damian Kao 16k

0

Entering edit mode

Yes the purpose would be similar.

May I ask you how the tag counts per kb is different from FPKM ? apologies for any stupid question here :)

ADD REPLY • link 9.6 years ago by Aurelie MLB ▴ 360

Ram · Answer 3 · 2015-04-29

2

Entering edit mode

9.6 years ago

Michael Love ★ 2.6k

We discourage cross posting the same question on multiple sites because it duplicates everyone's effort in answering your questions. At the least, please link to the Bioc support site posts.

ADD COMMENT • link 9.6 years ago by Michael Love ★ 2.6k

0

Entering edit mode

Apologies! I did not know. I posted here first and then saw that the Bioconductor support website was recommended in your documentation so I thought it would be more appropriate to post there finally. All I can say now is that it will not happen again...

The bioconductor post is there with your answer is there: https://support.bioconductor.org/p/67132/

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by Aurelie MLB ▴ 360

0

Entering edit mode

no worries. thanks for adding the link. this helps people follow the trail.

ADD REPLY • link 9.6 years ago by Michael Love ★ 2.6k