Question

How to quantitative identification outlier from technology replication and biological replication

0

Entering edit mode

5.8 years ago

Grace_G ▴ 20

Hi,

Data:
I get two count matrix (tissue A, tissue B) of genes to identity DEGs, the their includes samples technology replication and biological replication.

To get DEGs:
However, if here no technology replications of each sample, I can directly to do compare. So I'm going to get average of technology replications to represent each sample, it's right?

PCA Visualization:
For tissue A, (same for tissue B)
I use tissue A matrix to draw PCA, for each sample, their technology replications shows outliers, but how to quantitative identification them, is their some ways, so remove the outliers to calculation average?

Any views will be much appreciated!

rna-seq R next-gen • 1.2k views

ADD COMMENT • link 5.8 years ago by Grace_G ▴ 20

1

Entering edit mode

Averaging read counts is never a good idea, at least not if you want to use something like limma, edgeR or DEseq2 for DEG analysis. And to mix technical and biological replicates in a linear model using these tools (limma, etc.) is to my knowledge very difficult (unless you have a PhD in statistics). My advice is to read the manual of limma (etc.) very carefully, and follow the examples they give in there. Also read carefully how they handle biological and technical reps. I know for instance that with limma, you can include technical replicates with duplicateCorrelation(), it is shown in chapter 18 (Yoruba HapMap case study).

ADD REPLY • link 5.8 years ago by Benn 8.3k

0

Entering edit mode

Thanks a lot! Very helpful and practical idea, I will read these part carefully. And I'm not sure never a good idea mainly means calculate average is will produce decimal point, so can't as input? Actually, why not after process outlier then directly use t-test? since there are many requirements to use tools like Deseq2, but my data here is not easy to meet, and it's what I'm going to try. Looking forward to your comment:)

ADD REPLY • link 5.8 years ago by Grace_G ▴ 20

1

Entering edit mode

It has been shown that these tools (limma, edgeR, etc.) perform much better than a t-test. Please talk with statisticians, or ask the developers of these tools themselves on bioconductor. But please be aware, that before asking a question there (and here also), you are expected to do some research yourself first, like searching google about your topic (I know for example that the question about why limma is better then a t-test has been asked many times before), or try reading the manuals of these tools thoroughly. So go research a bit more on that is my advice.

ADD REPLY • link 5.8 years ago by Benn 8.3k

1

Entering edit mode

I see, thanks for sharing these useful ways for thinking and studying, best wishes!

ADD REPLY • link 5.8 years ago by Grace_G ▴ 20