I have a metadata file as below and a matrix count file. I would like to run a differential expression using DeSEQ2 to find differentially expressed genes taking into account Age and Gender. (A=Young,B=Middle Age, C=Old)
My Question is can someone check my code and correct where necessary exactly what needs to be changed.
Thank you very much in advance for your time and help.
library("DESeq2")
b<-read.table("ARGS_GA", as.is=TRUE,header=TRUE, sep="\t")
a<-read.table("matrix_countA", header=TRUE, row.names=1,sep="\t")
dds<-DESeqDataSetFromMatrix(countData =a,b,design=~ condition + Gender + Age)
dds <- dds[ rowSums(counts(dds)) >= 2, ]
dds<-DESeq(dds)
res<-results(dds, contrast=list("condition_CONTROL_vs_CASE"))
saveRDS(res, file="case_control.RData")
resOrdered <- res[order(res$padj),]
saveRDS(resOrdered, file="case_control_ordered.RData")
sig <- resOrdered[!is.na(resOrdered$padj) &
resOrdered$padj<0.10 &
abs(resOrdered$log2FoldChange)>=1,]
saveRDS(sig, file="case_control.30sig2.RData")
How To Ask Good Questions On Technical And Scientific Forums
Sorry I am new to posting.
Thank you
Biostars is not a code review forum. If you have a specific question or error, we can probably help you out, but throwing a big hunk of code at people with little context is unlikely to get you the help you require.
You should typically put the variable you're interested in last in your design, so if you're interested in differences due to
condition
while accounting for differences fromAge
andGender
, your design would be~Age + Gender + condition
.Also,
sig
andsig2
are the same thing here, so not quite sure the purpose of that bit.Thank you,
I would like someone to scan the code to see is there anything wrong with my approach/code ?
Thank you
sig2
anywhere in your codeThanks Kevin I made the correction to my code.
When I run PCA on the normalized counts then Gender appeared to separate the data of controls + cases.
Just to clarify, would the following give results that already takes care of the gender and age based on the design
or do I need to run something more to get differential genes that has the effects of gender and age removed?
Thank you very much for your time and help
Yes, as per Asaf's comment
Out of curiosity, why ? I always thought that the order of variables does not matter.
Clarity, and the
results()
command will use contrasts for the last variable by default. You're right that it makes no functional difference though, you can get any comparison you want from those variables via properresults()
calls.Sorry but can someone check my design? I am still unclear on exactly how to identify the differentially expressed genes taking into account Age and Gender
Thank you again in advance for your time and help