Question

Differential gene expression by DESeq2

0

Entering edit mode

6.4 years ago

umeshtanwar2 ▴ 30

Hi, I am working on RNA-Seq data for Arabidopsis plant. I have samples from 4 genotypes (WT, KO1, KO2, OE) with 3 replicates for each. I used STAR for alignment with reference genome and annotation file in gff3 format. Then I obtained count.txt files for each sample by using featureCounts. Now I am doing the DGE analysis by DESeq2 as:

countdata <- read.table("counts.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("WT", 3), rep("KO1", 3), rep("KO2", 3), rep("OE", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)

I have following questions: 1. I used annotation file gff3 for alignment and for featureCounts, I converted gff3 to gtf format. Is it the right way or it would make any impact for DGE analysis? 2. I used SAM files in featureCounts. Is it okay or I should use BAM files? 3. In DESeq2 I am getting the results for each replicate individually. How can I merge the replicates for one genotype to compare with the other genotypes. I am not sure if the code I am using is right?

Any guidance from you will be very helpful.

Thank you

RNA-Seq • 2.2k views

ADD COMMENT • link updated 6.4 years ago by i.sudbery 21k • written 6.4 years ago by umeshtanwar2 ▴ 30

score 1 · Answer 1 · 2019-02-22

1

Entering edit mode

6.4 years ago

i.sudbery 21k

This should be fine.
As should this.
You need a final step in your analysis, which it to get the statistical results from your analysis. This is achieved using the "results" function. As in:
```
deseq_results <- results(dds, alpha=0.05, threshold=....., .....)
```
where threshold is the minimum fold change you are interested in . You will also need to specify the contrast you are intersted in. This will depend on whether you are interested in asking if either: a) genotype has an effect on expression overall, or b) if each genotype is different from a control. See the DESeq2 manual for details.

ADD COMMENT • link 6.4 years ago by i.sudbery 21k

0

Entering edit mode

Thank you @i.sudbery. It worked. Its showing:

out of 25686 with nonzero total read count adjusted p-value < 0.05 LFC > 1.00 (up) : 627, 2.4% LFC < -1.00 (down) : 2473, 9.6% outliers [1] : 0, 0% low counts [2] : 2944, 11% (mean count < 1) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

Can you please tell me how can I extract the upregulated and downregulated genes?

ADD REPLY • link 6.4 years ago by umeshtanwar2 ▴ 30

0

Entering edit mode

the deseq_results object should have two columns: one for adjusted pvalues (I can't remember if its called padj or adjustedPVal off the top of my head) and one for log2FoldChange. The up genes will be those with adjustedPVal < 0.05 and log2FoldChange > 1, and the down genes will be those with adjustedPVal < 0,05 and log2FoldChange < -1.