Question

How to do Differential Expression for developmental series for normalised values?

0

Entering edit mode

6.9 years ago

BehMah ▴ 50

Hi I have NORMALISED expression values (RPM) for developmental series and want to do differential expression. Does any one know how to do it. Thanks

rna-seq R • 1.7k views

ADD COMMENT • link 6.9 years ago by BehMah ▴ 50

1

Entering edit mode

Can you elaborate a little more, such as informing us of the programs that you have used so far in order to produce your normalised expression counts? Also, sample size, number of developmental stages?

If you have >=3 development stages, then you can use, for example, a simple ANOVA (or non-parametric Kruskall-Wallis) in order to derive p-values. For pairwise comparisons, you could use a Wilcoxon Signed Rank test due to the fact that the samples are from the same cell but just at different stages of differentiation.

Knowing the specific of your analysis so far will help to decide the best strategy going forward, though.

ADD REPLY • link 6.9 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks Kevin for your time and answer. Following tophat, I normalised reads by depth of library to obtain RPM for each sample. In ANOVA, does it give pvalue per gene ? Also if compare pairwise using Wilcoxon works then why doing ANOVA then ?

ADD REPLY • link 6.9 years ago by BehMah ▴ 50

1

Entering edit mode

So, you are using Cufflinks? Does Cufflinks not have its own in-built statistical tests? TopHat / Cuffinks is also very out of date. The upgraded versions are HISAT2 and StringTie.

The idea of ANOVA and a also a paired test between each pairwise condition is that they show different things: ANOVA shows differences between all conditions; a pairwise test shows differences between just 2 conditions at a time. So, these are not quite the same thing.

Assuming that you are using TopHat / Cufflinks, though, I recommend, first, that you upgrade to HISAT2 / StringTie, and then I also recommend that you use the statistical tests built into these programs in order to derive P values.

ADD REPLY • link 6.9 years ago by Kevin Blighe 88k

0

Entering edit mode

My pipeline is a bit different from normal way because Im doing DE for some non-coding RNA that are different from normal DE pipelines so I need to do statistics in my own way as a result can't use cufflinks/diff or DEseq/edgeR.

Does ANOVA give Pvalue per gene(row)? or does it just give overall pvalue for significance between groups?

If I do pairwise comaprision by t-test or Wilcoxon, would I need to run ANOVA before that?

ADD REPLY • link 6.9 years ago by BehMah ▴ 50

1

Entering edit mode

The ANOVA should return a separate P value for each gene, which is a measure of how the gene's variance differs across all conditions. The Wilcoxon test should also return a single P value for each gene. It depends on how exactly you implement these, though.

Are you using R, SPSS, Prism, STATA, or something else?

By the way, you should check the distribution of your data before running these tests, like via a histogram. That said, if you want to err on the side of caution, then make sure to us a Kruskal-Wallis ANOVA (non-parametric). The Wilcoxon test is non-parametric too.

ADD REPLY • link 6.9 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks again Kevin.

I am using R for all the stats. I think to return pvalue foe each gene I should change aov(gene.ex ~ group, data = my_data) a bit.

ADD REPLY • link 6.9 years ago by BehMah ▴ 50

1

Entering edit mode

I think that you should have a vector or column in a data-frame that indicates the sample grouping, and then you will have to perform the test for each gene, looping over the data-frame. For example

df
            group    Gene1 Gene2 Gene3 Gene4
    Sam1    Control  6     6     8     5
    Sam2    Control  5     6     3     4
    Sam3    Disease  9     8     2     4
    Sam4    Disease  8     6     7     5



test <- aov(Gene1 ~ group, data=df)

summary(test)

ADD REPLY • link 6.9 years ago by Kevin Blighe 88k

1

Entering edit mode

Also note that the non-parametric ANOVA, Kruskal-Wallis test, may be more appropriate given your data (if non-normal distribution or low sample numbers)