Question

Single vs paired ends in quantification

0

Entering edit mode

6.9 years ago

mgoldste • 0

I ran RSEM-Ebseq on some data, with paired and unpaired options and got different numbers for differentially expressed genes. Why might this be?

RNA-Seq • 1.8k views

ADD COMMENT • link updated 6.5 years ago by Corentin ▴ 660 • written 6.9 years ago by mgoldste • 0

2

Entering edit mode

Please post the commands used and explain in more detail what you did.

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

I took the paired end reads (separate files denoted R1 and R2) for each sample time (two time periods, 3 duplicates). I ran RSEM from a script file with the paired option (so the script file contained R1 and R2) and through a separate script ran R1 and R2 individually.

I then ran ebseq either using the paired reads RSEM-1.3.1/rsem-run-ebseq GeneMat.txt 3,3 GeneMat.results

or RSEM-1.3.1/rsem-run-ebseq GeneMat.txt 6,6 GeneMat.results for unpaired,

then /RSEM-1.3.1/rsem-control-fdr --hard-threshold GeneMat.results 0.05 GeneMat.de_hard.txt for both results files individually. The genes with fdr < 0.05 was significantly more for the unpaired reads set than the paired.

ADD REPLY • link 6.9 years ago by mgoldste • 0

1

Entering edit mode

R1 and R2 reads came from the same cDNA / RNA fragment, so when you count R1 and R2 individually and sum the results, you are essentially artificially multiplying the read counts by two. What would this entails?

Imagine you toss a coin 20 times, obtaining 6 heads and 14 tails. Performing a goodness-of-fit Chi-Square tells you the coin is fair (lets use R for the test):

coinTosses <- c(6,14)
chisq.test( coinTosses, p=c(0.5,0.5) )

Chi-squared test for given probabilities

data:  coinTosses
X-squared = 3.2, df = 1, p-value = 0.07364

Now artificially multiply the values obtained by 2:

chisq.test( 2*coinTosses, p=c(0.5,0.5))

Chi-squared test for given probabilities

data:  2 * coinTosses
X-squared = 6.4, df = 1, p-value = 0.01141

ADD REPLY • link 6.9 years ago by h.mon 35k

score 1 · Answer 1 · 2018-09-27

1

Entering edit mode

6.9 years ago

Kristoffer Vitting-Seerup ★ 4.2k

If you have paired data and run in unpaired mode it will quantify the reads from each file - meaning that for all pairs you will doublecount the fragment.

ADD COMMENT • link 6.9 years ago by Kristoffer Vitting-Seerup ★ 4.2k

score 0 · Answer 2 · 2019-02-11

You might be interested in this post:

https://biowize.wordpress.com/2014/03/04/understanding-rsem-raw-read-counts-vs-expected-counts/

The first time I compared raw reads counts to RSEM’s expected counts, I encountered an unexpected trend: the expected counts were not slightly lower than the raw counts, they were consistently lower by a factor of 2. After thinking about this a bit, I considered the possibility that RSEM treats each pair of reads as a single unit given paired-end data. To confirm this, I selected a small subset of 10 million read pairs from one of my samples and ran RSEM twice: once in paired-end mode, and once in single-end mode disregarding the read pairing information. Consistent with my hypothesis, the expected counts produced in single-end mode were almost exactly 2 times to the expected counts produced in paired-end mode.