Question

differential expression analysis

0

Entering edit mode

6.7 years ago

bioinfo456 ▴ 150

I have the data contained in an excel format (ie; gene ids, samples, corresponding gene counts). Can somebody please explain to me how i can feed this to deseq2? I have imported this excel file into Rstudio. What next? Any sort of help would be much appreciated. Thanks.

RNA-Seq deseq2 differential expression analysis • 1.6k views

ADD COMMENT • link updated 6.7 years ago by caggtaagtat ★ 1.9k • written 6.7 years ago by bioinfo456 ▴ 150

1

Entering edit mode

How exactly is your data structured? Do you have one exel file per treatment?

ADD REPLY • link 6.7 years ago by caggtaagtat ★ 1.9k

score 1 · Answer 1 · 2018-03-20

Hi,

I would upload the data in R and save them with e.g. write.table(myfile,file="myfile.tab", sep = " ", row.names = F, col.names = F) in a certain directory with the ending ".tab".

Now you could do the following:

#Define directory
setwd("Path_to_your/Files")
directory <- getwd()

#First you need a sample Table for DESeq2 which holds information about where your data is and what condition it stands for

#Grab all files with ending .tab (i don't know if it works for .xlsx files)
sampleFiles <- grep("tab",list.files(directory),value=TRUE)

#Name the samples on sampleTable with the characters befor the ".tab"
sampleCondition <- sub(".tab","\\1",sampleFiles)
sampleCondition <- substr(sampleCondition,1,3)

#Create the sample table
sampleTable <- data.frame(sampleName = sampleFiles,
                          fileName = sampleFiles,
                          condition = sampleCondition)

#Now you have to install DESeq2 if you haven't done that already
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")

#Get it in your library
library("DESeq2")

#Create DESeqDataSet from your data
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~ condition)

#Now remove genes with less than 1 read, to get rid of noise
keep <- rowSums(counts(ddsHTSeq)) >= 1
dds <- ddsHTSeq[keep,]

#To establish the reference for the comparisons between the samples, depending on your data, define one sample as your reference
dds$condition <- relevel(dds$condition, "untreated")

#The following is the core function of the DESeq2 package and 
dds <- DESeq(dds)

#Now you can access the results with the function for example with
res <- results(dds, name="condition_treated_vs_untreated")

#Some other analysis function could be following

#Doing normalizations
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
ntd <- normTransform(dds)

#Creating heatmap
library("pheatmap")
select <- order(rowMeans(counts(dds,normalized=TRUE)),
                decreasing=TRUE)[1:20]
df <- as.data.frame(colData(dds) )
pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
         cluster_cols=FALSE, annotation_col=df)



sampleDists <- dist(t(assay(vsd)))

library("RColorBrewer")
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors)


#Doing PCA
plotPCA(vsd, intgroup=c("condition"))

This are just some basic applications with the DESeq package and there are many interesting things, to also do which can be found various help sites and blogs, for example the vignette or workflow on bioconductor.