Good afternoon,
I have a question about using collapseReplicates in DESeq2. As far as I understand, this function adds up the counts belonging to one biosample. I understand the meaning of this if the number of technical replicates per biosample is the same for all samples. Please tell me what should I do if the number of technical replicates per sample differs?
For example, SAMNXXXXX corresponds to SRRXXXXX1 (count = 5) and SRRXXXXX2 (count = 6), SAMNYYYYY corresponds only to SRRYYYYY1 (count = 10). If I add up the counts for SAMNXXXXX (5 + 6 = 11) and then compare it with count for SAMNYYYYY (10), I will get an incorrect conclusion that the expression is higher in SAMNXXXXX.
Maybe I need to take the arithmetic mean or something else? It seems to me that the arithmetic mean is not very reasonable. For example, I have counts of 181 and 2 for different replicates of the same biosample.
Note: this situation is not observed for most samples. For example, in a particular dataset there are 89 biosamples without technical replicates and and 5 biosamples with 2 technical replicates in each.
Thanks!
Good regards, Poecile
Thank you very much! Сould you please confirm that I am acting in the correct order?
e t.c.
Normalization happens after
collapseReplicates
, during this step:So you are all good !
Thank you for your help!