Hi,
I would upload the data in R and save them with e.g. write.table(myfile,file="myfile.tab", sep = " ", row.names = F, col.names = F)
in a certain directory with the ending ".tab".
Now you could do the following:
#Define directory
setwd("Path_to_your/Files")
directory <- getwd()
#First you need a sample Table for DESeq2 which holds information about where your data is and what condition it stands for
#Grab all files with ending .tab (i don't know if it works for .xlsx files)
sampleFiles <- grep("tab",list.files(directory),value=TRUE)
#Name the samples on sampleTable with the characters befor the ".tab"
sampleCondition <- sub(".tab","\\1",sampleFiles)
sampleCondition <- substr(sampleCondition,1,3)
#Create the sample table
sampleTable <- data.frame(sampleName = sampleFiles,
fileName = sampleFiles,
condition = sampleCondition)
#Now you have to install DESeq2 if you haven't done that already
source("https://bioconductor.org/biocLite.R")
biocLite("DESeq2")
#Get it in your library
library("DESeq2")
#Create DESeqDataSet from your data
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = directory,
design= ~ condition)
#Now remove genes with less than 1 read, to get rid of noise
keep <- rowSums(counts(ddsHTSeq)) >= 1
dds <- ddsHTSeq[keep,]
#To establish the reference for the comparisons between the samples, depending on your data, define one sample as your reference
dds$condition <- relevel(dds$condition, "untreated")
#The following is the core function of the DESeq2 package and
dds <- DESeq(dds)
#Now you can access the results with the function for example with
res <- results(dds, name="condition_treated_vs_untreated")
#Some other analysis function could be following
#Doing normalizations
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
ntd <- normTransform(dds)
#Creating heatmap
library("pheatmap")
select <- order(rowMeans(counts(dds,normalized=TRUE)),
decreasing=TRUE)[1:20]
df <- as.data.frame(colData(dds) )
pheatmap(assay(ntd)[select,], cluster_rows=FALSE, show_rownames=FALSE,
cluster_cols=FALSE, annotation_col=df)
sampleDists <- dist(t(assay(vsd)))
library("RColorBrewer")
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(vsd$condition, vsd$type, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
col=colors)
#Doing PCA
plotPCA(vsd, intgroup=c("condition"))
This are just some basic applications with the DESeq package and there are many interesting things, to also do which can be found various help sites and blogs, for example the vignette or workflow on bioconductor.
How exactly is your data structured? Do you have one exel file per treatment?