Hi Everyone:
I am trying to analyze a set of RNAseq data using DESeq2 to do a GO enrichment analysis. In my dataset have 8 groups. The groups are cond_09 cond_10 cond_11 cond_12 cond_13 cond_14 cond_15 cond_16
with 3 replicatesA, B & C
' for each group.
The columns in my input TSV file are (GeneID cond_09_A cond_09_B cond_09_C cond_10_A cond_10_B ......... cond_16_A cond_16_B cond_16_C
).
Right now, my goal is to find out compare all other groups against cond_09
and to calculate the fold change of the genes. I believe instead of doing it pairwise I should feed all the data for all the groups at once, and then estimate what I want. Here is my script I used to do it. I wanted to use the for loop to calculate the fold changes it for the pairs I want and then to store it in a unique TSV. I am getting the following couple of errors inside the for loop.
What am I doing wrong? I am brand new to R and I am finding it so confusing compared to MATLAB.
res <- results(dds, pAdjustMethod="BH", contrast=c("sample", caseA, caseB))
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, : 1 and cond_10 should be levels of sample such that sample_1_vs_cond_09 and sample_cond_10_vs_cond_09 are contained in 'resultsNames(object)'write.table(data.frame(res), + file = 'output.tsv') sep='\t',
Error: unexpected ',' in "sep='\t'," row.names=F,
Error: unexpected ',' in "row.names=F," quote=F)
Error: unexpected ')' in "quote=F)"
#!/usr/bin/env Rscript
library(DESeq2)
library(tidyverse)
#print("ensure that Infected = B and uninfected = A to get correct results")
df <- read.table('data_scaled_normalized_counts.tsv', header=T)
samplename_df <- data.frame(samplename = names(df)[2:ncol(df)])
samplename_df$sample <- substr(samplename_df$samplename, 1,7)
samplename_df$batch <- substr(samplename_df$samplename, 9,9)
row.names(samplename_df) <- samplename_df$samplename
samplename_df$sample <- factor(samplename_df$sample)
count_matrix <- subset(df, select = -c(GeneID) )
row.names(count_matrix) <- df$GeneID
dds <- DESeqDataSetFromMatrix(countData = count_matrix[ , order(names(count_matrix))],
colData = samplename_df[order(samplename_df$sample),],
design = ~ sample)
dds <- DESeq(dds)
#res <- results(dds, pAdjustMethod="BH", contrast=c("sample","cond_09","cond_10"))
samplelist <- unique(samplename_df$sample)
caseA <- samplelist[1]
for (j in samplelist[2:length(samplelist)]){
caseB <- j
#res <- results(dds, pAdjustMethod="BH", contrast=c("sample", 'cond_09', 'cond_10'))
res <- results(dds, pAdjustMethod="BH", contrast=c("sample", caseA, caseB))
print(res)
res$GeneId = row.names(res)
write.table(data.frame(res),
file = paste("output", caseB, "vs", caseA, ".tsv")
sep='\t',
row.names=F,
quote=F)
resultsNames(dds)
}
Additionally, the conditions are timepoints of data collection in a time series experiment at which the samples were collected from tissues that were increasing in terms of dry mass. To account for increasing mass of the tissues, I was advised that I normalize all the reads at all timepoints with the reads of a specific housekeeping gene, as we don't have data on dry biomass of the collected samples. To convert to integer I multiplied each by 100000 and rounded off the reads. I have read in multiple threads that the reads should not be normalized (FPKM etc). Is there any issue with normalizing to read counts of a housekeeping gene, or normalizing to unit cell mass?
Thank you in advance!
Anby
Change caseA definition to
caseA <- as.character(samplelist[1])
and add a comma afterfile = paste("output", caseB, "vs", caseA, ".tsv")
.