Question

DESeq: too low p-value for RNAseq

0

Entering edit mode

8 months ago

doramora ▴ 10

Hello everybody!

I've got a problem connected with counting p-value in my experiment. I've got 8 RNAseq samples (and 4 repeatings for each sample), 4 samples - WT, 5-8 samples - with gene knock-down, so I've made a big file with all the countings and then I've started analizing it with help of DESeq (the R code is below). I did no normalization, because I've read, that DESeq do it automatically. And an amount of p-value == 0 or p-value <e-309 is crazy. *I've checked this genes in countings and they really differs a lot, like 90 to 800 reads. What could be wrong? I'm new to data analyzing.

Thank you!

dds <- DESeqDataSetFromMatrix(countData = cts_1,
                          colData = coldata,
                          design= ~ Cell_type)
dds <- DESeq(dds)
res <- results(dds, tidy=TRUE)
res <- as_tibble(res)`

R DESeq p-value RNA-seq FDR • 849 views

ADD COMMENT • link updated 7 months ago by Papyrus ★ 3.0k • written 8 months ago by doramora ▴ 10

0

Entering edit mode

Can you show the output of summary(res)? Possibly before you convert it to a tibble? We need to know what you mean by crazy amount.

ADD REPLY • link 7 months ago by Michael 55k

0

Entering edit mode

Also, please clarify if you are talking about p-values or adjusted p-values

ADD REPLY • link 7 months ago by i.sudbery 20k

0

Entering edit mode

I'm talking about both, I've added an image near to clarify.

ADD REPLY • link 7 months ago by doramora ▴ 10

0

Entering edit mode

I have 7666 genes and p-value of 5393 of them is <0,05, and for 2876 of them the p-value is lower then 0.000000001. I've counted -log10(FDR) for my data and It's between 0 and 300. It scares.

ADD REPLY • link 7 months ago by doramora ▴ 10

0

Entering edit mode

Looks like an in vitro experiment with cell lines, right? these can sometimes show thousands of DEGs because of lots of unspecific effects of the knock-down. Also, you say that you have "4 repeatings for each sample" are these like technical replicates? If so, it may be better to add them together or model them rather than consider them independently as you would with biological replicates.

ADD REPLY • link 7 months ago by Papyrus ★ 3.0k

0

Entering edit mode

Could you please tell the differences technical and biological replication? I've been provided with RNAseq data from 8 samples, each of which was sequenced 4 times to increase accuracy. Does this constitute technical replication? If so, would it be advisable to aggregate, calculate the mean, or employ another method for each set of replicates? *I apologize for any naive questions, I'm relatively new to this field and learning as I go. Additionally, if you're aware of any online courses or recommended reading materials, I would greatly appreciate it. Thus far, I've struggled to locate comprehensive information, instead piecing it together from fragmented sources.

ADD REPLY • link 7 months ago by doramora ▴ 10

0

Entering edit mode

Sometimes the line between biological and technical replicates is a bit fuzzy. In this case it seems you have technical replicates, because it is the same sample, resequenced. See this answer by the DESeq2 author in which he recommends adding them. I think there is a DESeq2 function for this (collapseReplicates) but I've not used it. Other very informative answers for handling replicates in RNA-seq: this one, and using other worflows which allow more complex modelling (limma) this other one.

ADD REPLY • link 7 months ago by Papyrus ★ 3.0k