Gene-Level Analysis Of Rna-Seq Matched Pairs Of Samples?
4
0
Entering edit mode
12.3 years ago
Ryan Thompson ★ 3.6k

I am analyzing some RNA-seq data in which we have pairs of samples from the same individuals, before and after treatment. For example, we might have 4 samples:

  • Individual 1, bfore treatment
  • Individual 1, after treatment
  • Individual 2, before treatment
  • Individual 2, after treatment

Unfortunately, as far as I can tell most standard RNA-seq tools will treat my data as simply a set of 2 pre-treatment samples and another set of 2 post-treatment samples, with no regard for the fact that they are matched pairs of samples from the same individuals. That is, the statistical test being performed is essentially testing for differences between two (or more) groups of unlabeled samples. In contrast, I want to test for consistent changes in response to treatment across individuals. Is there an analysis program or package for RNA-Seq data that supports matched pairs of samples like this?

Note that for now I am not interested in alternative splicing, but rather just testing at the gene level for differential expression.

As an example of the limitation I am looking to overcome, consider this quote from the conclusion of the baySeq paper which confirms what I have said above:

... at present these methods remain limited to comparisons involving multiple groups, and are not able to account for, for example, paired samples.

It seems that at least DESeq, edgeR, and cuffdiff share the same limitation.

rna-seq • 11k views
ADD COMMENT
0
Entering edit mode

So far I have simply performed DESeq in 1v1 mode for each individual, then collected the differential gene sets and looked for common entries in a post-process step. It's not statistically rigorous, and I'm going to look into matted's suggestion of edgeR. He said edgeR's documentation explains paired sample design!

ADD REPLY
4
Entering edit mode
12.3 years ago
matted 7.8k

Several packages allow more sophisticated analyses along the lines you describe, where you can give it specific design matrices that account for multiple overlapping treatments and different samples. I am most familiar with using edgeR for this task. See the edgeR user's guide, specifically section 2.6, "More complex experiments (glm functionality)."

The "RNA-Seq of oral carcinomas vs matched normal tissue" and "RNA-Seq of pathogen inoculated arabidopsis with batch e ffects" examples in the guide seem closest to the experiment you describe. With this framework, you should be able to calculate treatment-specific contrasts while allowing for individual-specific variation.

ADD COMMENT
0
Entering edit mode
10.8 years ago

What do you think about http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436821/ ?? I am using it but I have to read the maths ...

ADD COMMENT
0
Entering edit mode
10.8 years ago

Doesn't the usefulness of the paired info depend on you knowing there is low biological variance in the expression of the genes you are looking at?

It does little good to say "In sample 1, expression tripled between treatment and control, while in sample 2 it only doubled" unless you know that those fold changes are well outside the range of natural variation, right?

I just worry that applying sophisticated algorithms to underpowered data is going to be a wild-goose chase.

ADD COMMENT
0
Entering edit mode
10.8 years ago

cuffdiff has the limitation that you mention. DESeq and edgeR do not. I would personally use a 2-way ANOVA using log2(RPKM + 0.1) value.

If you are curious about how the options would compare (at least in larger patient cohort), I ran some benchmarks with paired tumor vs. normal data.

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

http://bioinfo.aizeonpublishers.net/content/2013/6/285-292.html

The short answer is that I think DESeq is the best out of the 3 options that you listed

ADD COMMENT

Login before adding your answer.

Traffic: 807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6