Hi All,
I have a question regarding the statistical test that I need to perform on my dataset (over 1,000 SNPs and about 300 sequences). Its primarily made up of SNP data (as binary 0/1 - as in SNP being found or absent) and I what to find the combination of SNPs that give the highest activity. Ive had a look at logistical regression functions (which I read are perfect for binary data of this calibre), but most examples Ive seen deal with the total number of "0" or "1" observations in the dataset, making it more of an additive approach (i.e. the more SNPs you have, leads to the highest activity). But what I'm after is identifying the particular SNPs or combination of SNPs that produce the greatest effect on the activity (may or may not be a large amount of SNPs that give the highest activity).
Here's an example of the dataset.
Sequence SNP1 SNP2 SNP3 .... Activity
1 0 0 1 760
2 1 0 0 123
3 1 1 0 1009
4 1 0 1 6
.....
Any help or advice would be much appreciated. Im learning R, so any examples in that language would be really helpful