Entering edit mode
11.4 years ago
TitoPullo
▴
190
I have a dataset with 2 Millions SNPs associated to 1000 individuals. Each SNP is represented, for each individual, as: 0,1 or 2 (the number of minor alleles). I'd like to reduce the number of SNPs in order to use them as attributes with an SVM. Is it a correct approach calculate the r^2 (r squared) for each pair of SNPs and then consider only the ones with a correlation smaller than 0.8 (I found this value as well used in literature)?
You're right, I made a mistake writing the post! Anyway is it a biologically correct approach?
I never used this approach, but if you need to reduce the number of SNPs I think that this can be a reasonable approach, but this is just my personal opinion.
Have you ever face with this problem (reduction of the number of SNPs)? If yes which approach did you used?
No, I never had to reduce the number of SNPs. But I think that the most common approach is the one you described. So, my suggestion is to try that way.