For this step:
dds < - DESeqDataSetFromMatrix(countData = counts,
colData = coldata,
design = ~condition)
I read a lot of examples on the website, but these examples seems for majority cases, here are three parts I couldn't make sure:
countData = counts
Generally, part of counts matrix like the following, conl1 conl2 are two samples,conl1 conl2 rep1 rep2 ENSMUSGid1 0 0 3 4 ENSMUSGid2 0 1 0 0 ENSMUSGid3 0 1 10 12
but for my data, even as for conl1, here are three experimental replications, so, it means, for colnames of my raw countData:
conl1_1, conl1_2, conl1_3, conl2_1, conl2_2.... I plan just seem all of them(conl1_1, conl1_2, conl1_3) as conl. Is it okay?colData = coldata
Can this coldata contains all potential variables I want to study, including cell type column, batch column, gender column, age column? And let design = ~condition1 + condition2 +condition3...(Order according to proportion from PCA)? Or study each variable one by one?design = ~condition
Actually my colnames of raw countData also contains batch information, generally, people use design = ~condition, can I use design = ~batch + condition (I gave one column of batch for colData = coldata)? Or run a design = ~condition DEseq2 result then design = ~batch DEseq2 result? Or I should check batch effects exist firstly, then corrected, then do design = ~condition?
Thanks for your attention in advance, sorry for so many confused... Any ideas of each part will be very grateful!!!