Entering edit mode
6.2 years ago
Jeremy Leipzig
22k
Let's say you were given a microarray probeset, consisting of sparsely scattered SNPs of Allele A and B. If you didn't know from which genome or freeze it was derived, how would you find out?
Eventually
blat
but a normal blast should throw some light on the genome in question as a start. How long are the probes?I only have the SNPs that the probes interrogate, not the actual probe sequences
Did you download the manufacturer's library information? They usually contain full target sequence
Only SNP no other information it sounds like.
right this is a complete mystery, so I need a very sparse gappy aligner or something that can take a list of chromosomal coordinates and extract the sequences from every genome on earth
Was this a commercial array? That should narrow the search space down some. What kind of array?
To narrow it down further, perhaps you could just get chromosome # and length from the SNP list? This assumes the sequence names for SNPs are informative and approximate chromosomal names to available databases. If there are more chromosomes in the SNP array than a possible ref, you could rule that out. Then you could rule out references with chromosomes shorter than the SNP list would indicate. It might take some doing to check for variations in naming though.
Only thing that seems to be available is a list of chromosomal coordinates and a base to go with that number.
it is a custom array derived from some freeze of a genome