Hi everyone,
I came across the following issue
I have ~2k artificial sequences of length 39000 (concatenated SNPs) each. For each sequence I have a "sequence type" (or you can call in genotype) - about 20 groups in total. My task is to determine a minimal SNP set which will allow to discriminate these groups (for example, I want to obtain a list of unique SNPs for each group (so by using just one you can distinguish this group from others), or, if it isnt possible, a list of SNPs, so I can tell "if you have SNP A and SNP B this sample is from group X"\"if you have SNP A and you dont have SNP B this sample is from group Y"
Is there any simple solution for that?