Entering edit mode
2.9 years ago
Vitor1
▴
120
Hi guys,
I was wondering, Is it wrong to use only a random subset of a dataset?
For example: One dataset that contains 40 samples (20 control, 20 treated lets say). Is it a mistake if a person takes 20 random samples from the dataset (preserving the control/treated ratio) to do the analysis?
I know that this is not ideal by far, and I tend to think that this should not be done, but I was just wondering if this is wrong in a research perspective, etc.
Thanks
Nothing in life is a mistake, once we learn from it, in which case the 'mistake' just becomes experience. However, what is the justification for random sub-setting?
I was just wondering the research ethics behind this, not really with a justification. Maybe for a faster analysis (alignment, etc), or hard drive space limitations, something like that.
I don't think that that will be a valid reason for publication in a journal. Your group lacks funding?