Question

miRNA-seq DE analysis with duplicate replicates. Am i correct?

1

Entering edit mode

6.0 years ago

k.kathirvel93 ▴ 310

Hi EveryOne,

I have a miRNA-seq data (One Control and One infected) from my own study. All Downstream analysis tools like edgeR and DESEQ2 (except NoiSeq) Needs Triplicates. But in my case i have only one sample from control and infected, so i have duplicated twice the same gene counts for control and infected to make triplicates. Now i have three samples for each control and infected. Am i right? is this meaningful ? Thanks in advance.

R next-gen • 2.7k views

ADD COMMENT • link updated 6.0 years ago by GiV17 ▴ 50 • written 6.0 years ago by k.kathirvel93 ▴ 310

0

Entering edit mode

Thanks for the answers everyOne. So, what will be the solution to come up with a correct DE analysis with single data without replicates.

ADD REPLY • link 6.0 years ago by k.kathirvel93 ▴ 310

2

Entering edit mode

a correct DE analysis with single data without replicates.

That is impossible.

ADD REPLY • link 6.0 years ago by WouterDeCoster 47k

1

Entering edit mode

You can plot the log2FCs for all genes, maybe with a decent cutoff to get rid of low counts, to get an idea which genes go up or down, but this will be purely explorative and is not reliable. There is no computational framework that compensates for underpowered experiments and poor design, unfortunately.

ADD REPLY • link 6.0 years ago by ATpoint 85k

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY • link 6.0 years ago by WouterDeCoster 47k

score 8 · Answer 1 · 2018-12-04

is this meaningful

No, absolutely not. Artificially creating a replicate which is identical to the unreplicated dataset will lead to large numbers of false-positives. Replicated data are necessary for the tools to estimate variability between replicates in order to decide if an observed difference in read count is consistent and most likely due to a true effect or rather a product of technical issues. Especially low counts suffer from high variability (and therefore high fold changes, "mean-variance-relationship") so even though the FCs are high, the significance is low. If you now introduce an artificial replicate with the same counts, the variability is 0 and all fold changes appear to be highly reproducible, leading to high(er) significances than with a real replicate, and this will introduce notable type I errors (false-positives). Unreplicated data are simply not good enough for sound statistical analysis, and there is no workaround other than creating more experimental (not in silico) replicates.

score 1 · Answer 2 · 2018-12-04

1

Entering edit mode

6.0 years ago

popayekid55 ▴ 110

replicates are used to calculate the significance of DE, duplicating the sample to generate replicates would not help. If you want to calculate DE, use DESeq1 which works without replicates

ADD COMMENT • link 6.0 years ago by popayekid55 ▴ 110

4

Entering edit mode

If you want to calculate DE, use DESeq1 which works without replicates

There is no single valid statistical method which can confidently give you differentially expressed genes if you don't use replicates.

ADD REPLY • link 6.0 years ago by WouterDeCoster 47k

2

Entering edit mode

I am starting to regret that we ever offered the "blind" mode with DESeq. The existence of this feature seemed to have misled too many users into believing that it is possible to perform a sensible analysis of RNA-Seq data without replication. It was, however, always only meant as a tool to salvage what is left from a botched experiment, and most of the time this will not be much.

Source: Simon Anders (developer of DESeq)

ADD REPLY • link 6.0 years ago by ATpoint 85k

0

Entering edit mode

Let me guess they were pressured by biologists/PI's?

ADD REPLY • link 6.0 years ago by Michael 55k

0

Entering edit mode

I would prefer not to generalize but and rather say they were pressured by people who willingly accepting poor experimental design and try to come away with it. It always depends on the circumstances. If the work is purely based on RNA-seq and you try to make big statements on an underpowered experiment, then this is clearly not ok. If you have a whole story based on well-performed and independent experiments and then use some unreplicated data with "relaxed" statistics, which is just another little brick in the wall that contributes to the overall story, it might be an acceptable thing to do. In the end, statistics is not sacrosanct and only one of many ways to show biological effects. Depends fully on the situation.

ADD REPLY • link 6.0 years ago by ATpoint 85k

score 1 · Answer 3 · 2018-12-04

1

Entering edit mode

6.0 years ago

GiV17 ▴ 50

If you don't have replicates, you can use NOIseq sim that simulates technical replicates from a multinomial distribution. Obviously to obtain a reliable statistical results, you need biological replicates, as well explained by Atpoint previously.

ADD COMMENT • link 6.0 years ago by GiV17 ▴ 50