Question

Make artificial read count table

0

Entering edit mode

6.5 years ago

hougiotaejut ▴ 30

Hi,

Assume that you have a count table where there are two dependent conditions with 3 replicates each.

 Gene       R1-C1      R2-C1     R3-C1     R1-C2     R2-C2      R3-C2  
   X1         43         52        38          120     131       115          
   X2         250        273       260         26       35       42            
   X3         112        100       120         205     200       150

To simulate data in a simple way, Is that correct to make an artificial count table for DE analysis by copying the first condition in the second condition like this?

 Gene       R1-C1      R2-C1     R3-C1     R1-C2     R2-C2      R3-C2  
   X1         43         52        38       43         52        38         
   X2         250        273       260      250        273       260            
   X3         112        100       120      112        100       120

So there is no DE gene. And to add some DE genes to the list, multiply some randomly chosen conditions in specified FCs.

Is that correct and acceptable?

On a paper, I read this "To assess how the different software packages and pipelines can control false positive rates, we utilized the multiple replicates within the sample groups by constructing artificial two-group comparisons. No significant detections were expected in such mock comparisons."

I just thought they had copied replicates the way I illustrated above. So that's why I'm asking you.

RNA-Seq simulation artificial • 1.5k views

ADD COMMENT • link updated 6.5 years ago by h.mon 35k • written 6.5 years ago by hougiotaejut ▴ 30

0

Entering edit mode

I would probably just shuffle the data for n times.

ADD REPLY • link 6.5 years ago by Eric Lim ★ 2.2k

0

Entering edit mode

I wanted to know if I understand that paper correctly. That's why I asked my question here. Because it seemed so strange to me to just copy and replace the replicates.

ADD REPLY • link 6.5 years ago by hougiotaejut ▴ 30

0

Entering edit mode

The quote you used is not from the paper you linked, it is from Comparison of software packages for detecting differential expression in RNA-seq studies.

ADD REPLY • link 6.5 years ago by h.mon 35k

score 2 · Accepted Answer · 2018-06-29

No, that was not the approach used in the paper. What they did was to artificially split samples from one treatment into two groups, and then compare these two "groups". So, for example, the mouse RNA-seq data had 10 samples of the C57BL/6J strain, and 11 samples of the DBA/2J strain. They randomly split the 10 C57BL/6J samples into two groups of 5 samples, and tested for differential expression between these groups.