How to filter RNAseq Data for being constant in all three biological replicates
1
0
Entering edit mode
6.8 years ago
Sepd • 0

Hi everyone,

I have a RNASeq Data set made of 3 biological replicates for the control and 3 biological replicates for the experiment. I used RSEM to calculate gene expression and EBseq to calculate the differential expression between the control and the experiment. When looking at either the control or the experiment, in some cases the expression values are not the same across the 3 replicates for a particular transcript. I have tried using the variance of the three controls and calculating distribution of variance which does not seem to be helping either. Does anyone know any other way or software I could use to filter out the transcripts that are not constant? I would truly appreciate any help.

Thank you

RNA-Seq sequencing • 3.0k views
ADD COMMENT
0
Entering edit mode

Perhaps I am misunderstanding something but here goes.

expression values are not the same across the 3 replicates for a particular transcript

and

Does anyone know any other way or software I could use to filter out the transcripts that are not constant?

Isn't that the whole point of doing biological replicates? To understand the variation present.

ADD REPLY
0
Entering edit mode

Having variations between replicates is very normal. How different are they? Why do you need them to be constant?

ADD REPLY
0
Entering edit mode

There are many transcript that have 3 very different values for example:

                rep1        rep2        rep3
FBtr0299513     15.84127647 110.2103896 29.14164901

You see this causes problems when I am looking at differential expression between 3 replicates of control and comparing them to 3 replicates of the experiment. What I want to do is filter out transcripts that look like this because I am thinking they are not reliable. I am looking for a way to pick out transcripts that are similar in 2 out of 3 of these replicates.

These biological replicates are from flies coming from the same condition so they should be showing the same expression across these replicates.

ADD REPLY
0
Entering edit mode

As @Frederike said below, most commonly used DE analysis programs take the biological variability into consideration when fitting models to the data. Take a look at the workflow she has linked below.

ADD REPLY
0
Entering edit mode

Thank you. Do you know if EbSeq considers this variability too? The list I am look at are transcripts that have been marked as deferentially expressed but when looking at the expression values of the replicates you can see they are not constant.

ADD REPLY
0
Entering edit mode

I would be surprised if DESEq et al would call the example that you showed above differentially expressed although it, of course, also depends on the values from the other condition. If those values were considerably higher (e.g., 1000, 1200, 900), I'd still think that your gene in question could be identified as being significantly lower in whatever condition you showed above.

No idea what EBSeq does, but I can only repeat myself by saying to please follow the established workflows.

ADD REPLY
0
Entering edit mode

I have a feeling those values are normalized but @Sepd will need to clarify.

ADD REPLY
0
Entering edit mode

They are normalized. I don't know how to rely on it being deferentially expressed when there are big differences in the expression values. Sometimes it shows zero expression in two of the replicates but one of the replicates has a huge value throwing off the whole calculation.

FBtr0085375 0.091674054 0.030532013 0.019790594 0.010525734 167.6551102 0.010300845

So for this transcript the first three columns are biological replicates for the experiment and the second three columns are the biological replicates for the control. This gave a log 2 fold change of 6, but you can see how the 5th column is throwing it off.

ADD REPLY
0
Entering edit mode

how to rely on it being deferentially expressed

That's what the (adjusted) p-values are for.

ADD REPLY
0
Entering edit mode

You would not want to filter that away just for that reason. What if your treated samples have values of 800, 900, 1000? Anyway, you don't want to re-invent the wheel by DIYing this. I woudl be shocked if your program didn't handle biological replicates appropriately, so just let it do its job. Though I would recommend using DESeq, or EdgeR,or limma instead, since more people use them, it's easier to get help. DESeq takes pretty much the same input as your program does.

ADD REPLY
0
Entering edit mode
6.8 years ago

To identify genes that are consistently different in the 3 replicates of your controls versus the 3 replicates in your experiment, use the packages that have been developed for differential gene expression analysis. limma, edgeR and DESeq2 have all been developed to exactly address the biological variability that should be captured by the differences you see in your 3 replicates.

For more details, see this bioconductor workflow description.

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion. I am looking to find similarity in the three replicates and filtering everything else out. I also want to look at the 3 sets in the control and 3 sets in experiment separately to not include transcripts in the differential expression analysis that are different in replicates.

ADD REPLY
0
Entering edit mode

to not include transcripts in the differential expression analysis that are different in replicates

I implore you to not do this manually. This is exactly what the tools I linked above are trying to do for you -- and they've had a decade of development!

ADD REPLY
0
Entering edit mode

I am really trying to avoid doing it manually. I will try using these instead of EBseq to see of it helps filter out these faulty transcripts. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6