Single vs paired ends in quantification
2
0
Entering edit mode
6.3 years ago
mgoldste • 0

I ran RSEM-Ebseq on some data, with paired and unpaired options and got different numbers for differentially expressed genes. Why might this be?

RNA-Seq • 1.5k views
ADD COMMENT
2
Entering edit mode

Please post the commands used and explain in more detail what you did.

ADD REPLY
0
Entering edit mode

I took the paired end reads (separate files denoted R1 and R2) for each sample time (two time periods, 3 duplicates). I ran RSEM from a script file with the paired option (so the script file contained R1 and R2) and through a separate script ran R1 and R2 individually.

I then ran ebseq either using the paired reads RSEM-1.3.1/rsem-run-ebseq GeneMat.txt 3,3 GeneMat.results

or RSEM-1.3.1/rsem-run-ebseq GeneMat.txt 6,6 GeneMat.results for unpaired,

then /RSEM-1.3.1/rsem-control-fdr --hard-threshold GeneMat.results 0.05 GeneMat.de_hard.txt for both results files individually. The genes with fdr < 0.05 was significantly more for the unpaired reads set than the paired.

ADD REPLY
1
Entering edit mode

R1 and R2 reads came from the same cDNA / RNA fragment, so when you count R1 and R2 individually and sum the results, you are essentially artificially multiplying the read counts by two. What would this entails?

Imagine you toss a coin 20 times, obtaining 6 heads and 14 tails. Performing a goodness-of-fit Chi-Square tells you the coin is fair (lets use R for the test):

coinTosses <- c(6,14)
chisq.test( coinTosses, p=c(0.5,0.5) )
Chi-squared test for given probabilities

data:  coinTosses
X-squared = 3.2, df = 1, p-value = 0.07364
  

Now artificially multiply the values obtained by 2:

chisq.test( 2*coinTosses, p=c(0.5,0.5))
Chi-squared test for given probabilities

data:  2 * coinTosses
X-squared = 6.4, df = 1, p-value = 0.01141
  
ADD REPLY
1
Entering edit mode
6.3 years ago

If you have paired data and run in unpaired mode it will quantify the reads from each file - meaning that for all pairs you will doublecount the fragment.

ADD COMMENT
0
Entering edit mode
5.9 years ago
Corentin ▴ 610

You might be interested in this post:

https://biowize.wordpress.com/2014/03/04/understanding-rsem-raw-read-counts-vs-expected-counts/

The first time I compared raw reads counts to RSEM’s expected counts, I encountered an unexpected trend: the expected counts were not slightly lower than the raw counts, they were consistently lower by a factor of 2. After thinking about this a bit, I considered the possibility that RSEM treats each pair of reads as a single unit given paired-end data. To confirm this, I selected a small subset of 10 million read pairs from one of my samples and ran RSEM twice: once in paired-end mode, and once in single-end mode disregarding the read pairing information. Consistent with my hypothesis, the expected counts produced in single-end mode were almost exactly 2 times to the expected counts produced in paired-end mode.

ADD COMMENT

Login before adding your answer.

Traffic: 4048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6