Dear all,
I have a data set for two condition (before and after treatment). The patients are in rows and the gen expression in levels. I want to compare if there is any different in treatment, but the main problem that I have is the unbalanced samples (control 30:case 60).
I would like to re-sample the data set with 60 IDs to 30 compare it with control (get p-values for each gene). And repeat it around 50 times.
I would like to do that in R, but I don't know any specific function to do that. The library(boot)
seems to be nice, but I couldn't figure out how to apply it to my data set.
Any advice?
Thanks
why not do you use this directly into Limma or similar packages?
Unbalanced samples are not a problem per se as long as the numbers are sufficient for the dispersion estimation. As JC says, feed the data into standard tools such as
limma
and obtain DEG results. Typically there is no need for custom approaches.Hi, I know how to do that in limma. The point is, that I want to create a function to apply it to a gene table, OTU table... in other words to extrapolate to different approach. Any clue? Thanks!
Please use the comment function, not the answer box.
It is unclear what you mean. Is the problem how to randomly pull samples? Please give a representative example.
Sorry, the "add comment" button doesn't work for me, it gives me an error all time, so this is why I used the "add comment".
Lets say I have a data frame like this
and I have 20 healthy samples and 40 cancer samples. I want to compare each gene using something like this
but what I really want is to re-sample the "cancer" samples (get 20 samples each time) and repeat the wilcoxon test 50 times using this resample
Is it RNA-Seq or microarray ? As already mentioned you should use a dedicated tool and not try to reinvent the wheel. For RNA-Seq use raw counts and DESeq2 or edgeR ; for microarray use limma.
Is the third time that I got this answer. I explained why I want to do that. Please, add I comment if you could give any new feedback