Question

Differential expression analysis 5 conditions 3 replicates each matrix counts

0

Entering edit mode

8.2 years ago

Pin.Bioinf ▴ 350

Hello, I have to do my first differential expression analysis. The thing is, a company already did it and Im doing it just to practice and compare results. They used DESeq2 and I want to use it too. I got the tables of counts using STAR mapper and cuantifier with the option --quantmode GeneCounts. The experiment is:

5 conditions (3 of them are naive or mock) -plant not infected (NAIVE) control -plant infected (With fly) -plant infected (With bacteria) -plant not infected but exposed to fly (mock) control -plant not infected but exposed to bacteria (mock) control

All these samples were taken at 2, 7, 14 and 21 days dpi, so im thinking about building a count matrix for each dpi and doing 4 differential expression analysis. (After that I will do a temporal analysis, so I just want to compare results)

Is it reasonable to build a matrix for each dpi like this?:

GENE_ID COUNT NAIVE21 INFECTED_FLY21 INFECTED_B21 NOTINFECTED_FLY NOTINFECTED_BACT

Also, I have 3 replicates for each, what do I do with them?? (Im new to analysis and I have no idea what to do with the 3 replicates of each sample, it would be a huge table if I add NAIVE 21a, NAIVE21b, NAIVE 21c , and so on...)

Thank you so much

Pilar

RNA-Seq • 3.2k views

ADD COMMENT • link updated 8.2 years ago by Devon Ryan 105k • written 8.2 years ago by Pin.Bioinf ▴ 350

1

Entering edit mode

Could you clarify your experimental design?

My guess would by that dpi is Days Post Infection, and you you have 3 biological replicates for each condition (infected with bacteria,naive...) at different times (2 days,7 days...) , if this is okay, the experimental design is crucial for creating a DEseq object ,could you update the question with a more clear explanation of the design so that we can help you

ADD REPLY • link 8.2 years ago by IP ▴ 780

0

Entering edit mode

Hello, yes il explain better:

Samples were taken at 2, 7, 14 and 21 days post infection. There are 5 conditions , and 3 plants (or biological replicates) for each condition.

I decided to do a diferential expression analysis for each dpi independently with DESeq2, and I did the following :

sampleTable< -data.frame(row.names=c("Bm14a","Bm14b","Bm14c","BTY14a","BTY14b","BTY14c","Mm14a","Mm14b","Mm14c","MTY14a","MTY14b","MTY14c","N14a","N14b","N14c"), condition=as.factor(c(rep("Bm14",3), rep("BTY14", 3), rep("Mm14", 3), rep("MTY14", 3),rep("N14", 3))))

dds <- DESeqDataSetFromMatrix(countData = cts,colData = sampleTable,design = ~ condition)

Then, for every comparison (there are 7) I did this:

Comp_1<-results(dds, contrast=c("condition","N14","Bm14"))

...

And to count total diferentially expressed genes for each comparison I did:

Comp_1_resSig <- Comp_1[which(Comp_1$padj <0.1),]

head(Comp_1_resSig[order(Comp_1_resSig$log2FoldChange, decreasing = TRUE),])

nrow(Comp_1_resSig)

Does this make sense? Or did I do something wrong?

ADD REPLY • link 8.2 years ago by Pin.Bioinf ▴ 350

Devon Ryan · Accepted Answer · 2017-05-03

3

Entering edit mode

8.2 years ago

Devon Ryan 105k

There's no reason to manually build a matrix yourself. Rather, create a sample table listing the samples, their group associations and the files with the counts and give that to DESeq2, likely via the DESeqDatasetFromHTSeqCount() function, or something along those lines. You will then include all of the biological replicates, which DESeq2 already knows how to handle (and is worthless without).

ADD COMMENT • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

Hello thank you so much, I was a little lost. So here is what I did once i got my counts matrix as cts. I dont know if the results I got are correct: I wanted to do many comparisons, I have 3 samples for each of 5 conditions:

sampleTable<-data.frame(row.names=c("Bm14a","Bm14b","Bm14c","BTY14a","BTY14b","BTY14c","Mm14a","Mm14b","Mm14c","MTY14a","MTY14b","MTY14c","N14a","N14b","N14c"), condition=as.factor(c(rep("Bm14",3), rep("BTY14", 3), rep("Mm14", 3), rep("MTY14", 3),rep("N14", 3))))
dds <- DESeqDataSetFromMatrix(countData = cts,colData = sampleTable,design = ~ condition)

Pre-filtering:

dds <- dds[ rowSums(counts(dds)) > 1, ]
dds <- DESeq(dds)

All my comparisons:

Comp_1<-results(dds, contrast=c("condition","N14","Bm14"))
Comp_2<-results(dds, contrast=c("condition","N14","BTY14"))
Comp_3<-results(dds, contrast=c("condition","N14","Mm14"))

and so on ...

Comp_7<-results(dds, contrast=c("condition","MTY14","Mm14"))

And to check the total differentially expressed genes for each comparison i did the following for each Comp_n:

Comp_1_resSig <- Comp_1[which(Comp_1$padj <0.1),]
head(Comp_1_resSig[order(Comp_1_resSig$log2FoldChange, decreasing = TRUE),])
nrow(Comp_1_resSig)

Is what I did correct? Are my results reliable? Are the p-values adjusted to each comparison? (I did not do a relevel because I read that for so many comparisons it wont make a difference)

ADD REPLY • link updated 8.2 years ago by Devon Ryan 105k • written 8.2 years ago by Pin.Bioinf ▴ 350

1

Entering edit mode

That looks correct, as long as you're truly just interested in pairwise comparisons.
The padj (or something along those lines) column has the adjusted p-values.

ADD REPLY • link 8.2 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks Devon. I dont know what else should I be interested in or what els I could do .. (maybe something regarding time series?)

Again, thank you

ADD REPLY • link 8.2 years ago by Pin.Bioinf ▴ 350