Question

Tag Snps Selection

0

Entering edit mode

11.4 years ago

TitoPullo ▴ 190

I need to use some SNPs data for a prediction problem. I have around 2 Million SNPs, so I need to pre-filter this data in order to use them. I found two main tools: Tagger and SNAP. Unfortunately both tools requires to select the population sample, my data refer to several populations (total of 9 subpopulations). Is there any other tool that allow to select the most informative SNPs regardless the population characteristics?

snp selection • 3.5k views

ADD COMMENT • link updated 11.4 years ago by Jorge Amigo 14k • written 11.4 years ago by TitoPullo ▴ 190

score 1 · Answer 1 · 2013-07-01

1

Entering edit mode

11.4 years ago

Jorge Amigo 14k

if you don't want to include any knowledge on the selection, wouldn't the LD patterns from HapMap be enough to define your selection? anyway, if you're looking for a particular tool, some of my colleagues prefer Pupasuite because it allows them to select SNPs according to their functional properties.

ADD COMMENT • link 11.4 years ago by Jorge Amigo 14k

0

Entering edit mode

But I'd like to use the data (i.e SNPs value for each individual in my dataset) I've got.

ADD REPLY • link 11.4 years ago by TitoPullo ▴ 190

0

Entering edit mode

tagSNPs are used in order to select a limited amount of markers to be informative. what you are talking about is not really to find tagSNPs, but to prioritize the SNPs results you have. so you have already genotyped your samples, and what you are asking is which SNPs to use in your study? if you don't want to care about population and stratification issues, the only thing you may safely do is to filter out monomorphic SNPs, as they won't be informative at all. everything from then on, including LD patterns and of course allele frequencies, do heavily depend on population information, so unless you really explain why would you want to "pre-filter" your data I can't see a benefit from it. think that most of the tools that perform high throughput genotyping data analysis, like PLINK for instance, allow the load of all the experiment data at once, and they are the ones that "pre-filter" your data in case it's needed.

ADD REPLY • link 11.4 years ago by Jorge Amigo 14k

0

Entering edit mode

I want to use SNPs as attributes for a classification problems (i.e given an individuals with known SNPs plus other informations, then predict if he belongs to a particular class, for example if he is ill or not). Unfortunately I can't use the entire amount of SNPs (they are almost 2 Millions) so I want to select the "most informative" ones. I've never used SNPs data, so sorry for my imprecisions!

ADD REPLY • link 11.4 years ago by TitoPullo ▴ 190

0

Entering edit mode

this sounds like an association study, and PLINK as mentioned is able to help you. what I don't understand is why you can't use all those SNPs, because tools that deal with PCA analysis, association studies, and all these bunch of typical analysis that are performed using microarray data are definitely capable of handling large amounts of genotypes. you certainly have an idea of how to proceed, but it isn't clear to me.

ADD REPLY • link 11.4 years ago by Jorge Amigo 14k

0

Entering edit mode

Because use 2 Millions of attributes is infeasible for any possible prediction algorithm!

ADD REPLY • link 11.4 years ago by TitoPullo ▴ 190

0

Entering edit mode

that's why the programs that deal with this kind of data do try to reduce the problem themselves. what I don't understand is why you want to do it yourself, unless you are trying to develop a new algorithm. in that case, again, this is not explained in your question.

ADD REPLY • link 11.4 years ago by Jorge Amigo 14k