How To Calculate The Sample Sizes Need To Discover Motifs
1
0
Entering edit mode
10.8 years ago
prabha ▴ 10

Hi,

I require some help in calculating sample size for my project. I would really appreciate if any statisticians could help. My project is about identifying motifs in a set of DNA sequences. I had hypothesized the following approach to discover motifs in the DNA sequences. My approach is that, say for example if I have a set of 100 DNA sequences, by making subgroups containing 10 sequences per group(random sampling) I expect the motif identification to be very efficient. I would like to know how to statistically make sense of this approach. That is how to statistically calculate how many sequences needs to be in each subgroup and how many subgroups is needed.

Thanks a lot for coming forward to help,

Prabhakaran

• 1.8k views
ADD COMMENT
0
Entering edit mode
10.8 years ago
Sameet ▴ 300

There is a slight problem with your approach. If you want a motif that is potentially present in all the sequences, then this method may not work very well. There is only a slight chance it will work at all actually. I would draw at least 50% of the sequences randomly, determine a motif, and repeat this process thousands (depending on number of sequences in the input). You would ideally want 100C50 samples drawn. And even this approach is very naive. If you give a better description of the problem, we will be able to help more.

All the best.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. In my approach i would be creating for example 100 subgroups containing 10 sequences per group which are selected randomly thus selecting all the 100 DNA sequences for analysis. My idea is to create subgroups and would like to know how may sequences needs to be in each subgroup and the number of subgroups needed for the 100 sequences. And the motifs need not necessarily be present in all the sequences.

ADD REPLY

Login before adding your answer.

Traffic: 2481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6