Question

Dealing with only 2 samples in RNA-SEQ

0

Entering edit mode

5.1 years ago

Rogerio Ribeiro ▴ 110

Hi all

For context: I have 4 different datasets of RNA-seq Illumina data (let's call them A, B, control A and control B). I know that cells from condition A and B produce a certain metabolite. The approach will be to determine DE genes in A_vs_controlA and B_vs_controlB and see which are common. However, I only have 2 replicas of control B (which has a different origin compared with control A), however, I believe each of these samples came from a library of several different individuals (which I´m not sure if it is that relevant). I know that statistically, I need to have at least 3 replicas but for several reasons, there is an impossibility of obtaining more data right now.

What are some approaches I can make to make my inferences more "robust"? Should lower the adjusted p-value threshold to be more restrictive? Should I simulate data based on my 2 samples?

Best

Transcriptomics Statistics • 1.1k views

ADD COMMENT • link updated 5.1 years ago by Antonio R. Franco ★ 5.2k • written 5.1 years ago by Rogerio Ribeiro ▴ 110

0

Entering edit mode

Please use google and the search function for unreplicated RNA-seq experiments. This has literally been discussed dozens of times. In short: Your results, no matter how you twist and turn it, will not be reliable since statistics requires replicates. More in the numerous threads you can find online.

ADD REPLY • link 5.1 years ago by ATpoint 88k

0

Entering edit mode

How edgeR handles no replicates

ADD REPLY • link 5.1 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

edgeR is primarily intended for use with data including biological replication. Nevertheless,RNA-Seq and ChIP-Seq are still expensive technologies, so it sometimes happens that onlyone library can be created for each treatment condition. In these cases there are no replicatelibraries from which to estimate biological variability. In this situation, the data analyst isfaced with the following choices, none of which are ideal. We do not recommend any ofthese choices as a satisfactory alternative for biological replication. Rather, they are the bestthat can be done at the analysis stage, and options 2–4 may be better than assuming thatbiological variability is absent. (...) Please understand that this is only our best attempt to returnsomething useable. Reliable estimation of dispersion generally requires replicates.

Please read the manual before doing things like this. It is not a reliable method. If possible do some more replicates.

ADD REPLY • link 5.1 years ago by ATpoint 88k

score 0 · Answer 1 · 2020-04-04

0

Entering edit mode

5.1 years ago

Antonio R. Franco ★ 5.2k

Take a deep read of the edgeR package, where is indicated what to do in your case

In addition. I cannot recall or ensure you can also use Noiseq

ADD COMMENT • link 5.1 years ago by Antonio R. Franco ★ 5.2k