Hi, I'm trying to follow the following tutorial (https://scienceparkstudygroup.github.io/rna-seq-lesson/08-cluster-analysis/index.html) to generate the differentially expressed genes in a developmental timeline and perform gene cluster analysis but I'm struggling with editing the script for my data and was hoping someone might be able to help me. I'm struggling with creating the list of DE genes and the normalized counts for these DE genes with the loop used (section 2.2) and was wondering if this could be done in a different way - more manually by generating the DE gene lists of each of my interested combinations. Any help will be much appreciated.
Sorry I wasn't clear about the problem. I'm unclear how to modify the code to loop through the different combinations of different expression analyses using the loop. I've copied the code that the tutorial used below. I don't have any columns to discard so I modified that. I also want to compare my samples by time so that would be my condition (I modified the "conditions" in the script with "Time"). I wasn't sure if I had to modify anything else in the script - I don't have much experience with loops so wanted to see if anything might be obvious to someone else. I've pasted the error I get at the end.
<h6>#</h6>normalize <- function(countdata, xp_design, f, p){ col.names = colnames(xp_design) # extract all column names columns.to.discard = c("sample","fq1","fq2","fq") # column we don't need to extract all conditions colsForConditions = col.names[! col.names %in% columns.to.discard] # only keeping the column of interest
one condition
if (length(colsForConditions) == 1){ condition <- factor(xp_design[,colsForConditions])
} else if (length(colsForConditions) == 2){
} else if (length(colsForConditions) == 3){ xp_design$conditions = paste(xp_design[,colsForConditions[1]],xp_design[,colsForConditions[2]],xp_design[,colsForConditions[3]],sep = ".") condition <- factor(x = xp_design[,"conditions"],levels = unique(xp_design[,"conditions"])) }
Analysis with DESeq2
Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix
coldata <- data.frame(row.names=colnames(countdata), condition) dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
Run the DESeq pipeline
dds <- DESeq(dds)
create dataframe containing normalized counts, to which the differential expression values will be added
resdata <- as.data.frame(counts(dds, normalized=TRUE))
iterate through the list of conditions to create differential expression (DE) values for all possible pairs
total_selec <- list() x <- 1 for(i in levels(condition)){ x <- x + 1 if (x <= length(levels(condition))){ for(j in x:length(levels(condition))){ res <- results(dds, contrast=c("condition", i, levels(condition)[j])) # i = first in pair, levels(condition)[j] is the second in pair. d <- paste(i, levels(condition)[j], sep="&") # paste the two conditions in one character, to be used as the pair name res$genenames <- rownames(res) resul <- as.data.frame(res) significantDE <- resul %>% filter(padj<p & (log2FoldChange>f | log2FoldChange<(-1*f)) ) selec <- as.list(significantDE$genenames) total_selec <- append(total_selec, selec) } } } total_selec <- c(unique(total_selec)) total_selec <- t(as.data.frame(total_selec)) selection <- resdata[total_selec[,1],] return(selection) }
DEgenes <- normalize(counts,xp_design,2,0.05)
<h6>#</h6>Error in data.frame(row.names = colnames(countdata), condition) : row names supplied are of the wrong length 3. stop("row names supplied are of the wrong length") 2. data.frame(row.names = colnames(countdata), condition) 1. normalize(counts, xp_design, 2, 0.05)