Question

Can I consider these values as differentially expressed?

0

Entering edit mode

21 months ago

vitoria-ts • 0

Hello!

I'm needing some help from the more experienced ones! n_n'

I'm doing a transcriptome expression comparison using DESeq2 and I would like to be sure I'm using the right parameters. This doubt came after seeing that a gene increased the expression between different sampling points but was not included in the DESeq2 output. Here are the expression values for this gene in EST values (7developmental stages, 3 replicates each):

Gene/DevStage   S1_R1   S1_R2   S1_R3   S2_R1   S2_R2   S2_R3   S3_R1   S3_R2   S3_R3   S4_R1   S4_R2   S4_R3   S5_R1   S5_R2   S5_R3   S6_R1   S6_R2   S6_R3   S7_R1   S7_R2   S7_R3
BetaCatenin1    6875    11251   6937    6229    10730   6944    16075   18568   19081   20628   19490   20675   18251   19120   20971   18500   21838   12948   15971   23489   15423

First, can I consider this gene as being differentially expressed? If yes, how can I change my script to include similar cases in my output?

Thank you!!! :)

Here is the script I'm using (any other suggestions to fix/improve it are welcome!!! :D ):

#Step1

table_MyOrganism<- read.table("ExpValues_MyOrganism.tsv", header=TRUE, sep ="\t", row.names = 1)

rd_matrix_MyOrganism <- round(table_MyOrganism)

#Step2

dds_MyOrganism <- DESeqDataSetFromMatrix(countData = rd_matrix_MyOrganism, colData = design_MyOrganism, design = ~ condition)

dds_normal_MyOrganism <- DESeq(dds_MyOrganism)

res_MyOrganism <- results(dds_normal_MyOrganism)

mcols(res_MyOrganism, use.names = TRUE)
res_MyOrganism

summary(res_MyOrganism)

#Step3

rlog_MyOrganism <- rlog(dds_normal_MyOrganism, bS5nd = FALSE)

pca_MyOrganism <- plotPCA(rlog_MyOrganism, returnData = TRUE)

pca_MyOrganism_b <- plotPCA(rlog_MyOrganism)

#Step4

dist_MyOrganism <- dist(t(assay(rlog_MyOrganism)))

dist_matrix_MyOrganism <- as.matrix(dist_HM_MyOrganism)

pheatmap(dist_matrix_MyOrganism);pca_MyOrganism_b

#Step5

result_DGE_S1_S2 <- results(dds_normal_MyOrganism, contrast = c("condition", "S1", "S2"), alpha = 0.05, lfcThreshold = 1)
result_DGE_S1_S2

Then I repeat this step (5) comparing all the stages in a pair-wise manner.

#Step6

up_reg01 <- as.data.frame(result_DGE_S1_S2)
up_reg01 <- up_reg01 %>% filter(padj <0.5 & log2FoldChange > 1)
write.csv(up_reg01, "/Users/vitoriatsantos/Documents/BioInfo/Poly_zor_genome/DESeq2/output_DESeq/MyOrganism_UP_S1_S2.csv", row.names = TRUE)

down_reg01 <- as.data.frame(result_DGE_S1_S2)
down_reg01 <- down_reg01 %>% filter(padj <0.5 & log2FoldChange < -1)
write.csv(down_reg01, "/Users/vitoriatsantos/Documents/BioInfo/Poly_zor_genome/DESeq2/output_DESeq/MyOrganism_DOWN_S1_S2.csv", row.names = TRUE)

I also repeat this step (6) for all the comparisons obtained in the previous step.

DESeq2 • 477 views

ADD COMMENT • link updated 21 months ago by Istvan Albert 102k • written 21 months ago by vitoria-ts • 0

score 0 · Answer 1 · 2023-05-09

The point of a statistical study is not to tweak the method until it includes all the points you think should be there.

That is why we use a statistical method to have provide us with a more objective assessment of the variability in the data.

A statistical method evaluate the intra-replicate (within replicates) variability against the inter-sample (across condition) variability.

The p-value reflects the chance of observing an effect size (or larger) by chance (when in reality there is no change).

When your intra-replicate variability is larger, or where the intra-sample variability is small, then the chance of observing the effect size by chance alone is much larger.

So evaluate your data not in the sense of how to make it do what you want it to do, but, instead think about why that particular row does not pass "judgment".