Is it wrong to use only a random subset of a dataset?

0

Entering edit mode

3.4 years ago

vitor ▴ 130

Hi guys,

I was wondering, Is it wrong to use only a random subset of a dataset?

For example: One dataset that contains 40 samples (20 control, 20 treated lets say). Is it a mistake if a person takes 20 random samples from the dataset (preserving the control/treated ratio) to do the analysis?

I know that this is not ideal by far, and I tend to think that this should not be done, but I was just wondering if this is wrong in a research perspective, etc.

Thanks

ethics research dataset • 1.1k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 3.4 years ago by vitor ▴ 130

1

Entering edit mode

Nothing in life is a mistake, once we learn from it, in which case the 'mistake' just becomes experience. However, what is the justification for random sub-setting?

ADD REPLY • link 3.4 years ago by Kevin Blighe 89k

0

Entering edit mode

I was just wondering the research ethics behind this, not really with a justification. Maybe for a faster analysis (alignment, etc), or hard drive space limitations, something like that.

ADD REPLY • link 3.4 years ago by vitor ▴ 130

0

Entering edit mode

I don't think that that will be a valid reason for publication in a journal. Your group lacks funding?

ADD REPLY • link 3.4 years ago by Kevin Blighe 89k

Login before adding your answer.