I would like to be able to draw a random list of N SNPs from dbSNP/UCSC. If I have a list of HapMap SNPs, for instance, in a bed file format, I can shuffle them and select 1000 at random. Since the placement of SNPs in HapMap is not necessarily representative of the totality of SNPs in the genome, I'd like to do this with dbSNP. Short of downloading a bedfile of all SNPs in the genome from which resampling might be computationally intensive, is there an easy method by which to draw some number of random SNPs from the genome and have them returned in BED file format?
To do this with a list of SNPs in a bed file, I currently use the shuf command like shown below. But to do this for the 56M SNPs currently in dbSNP in order to resample 10k random SNPS multiple times might be too intensive. Ideas? R perhaps? Anyway to do this from the unix prompt so I can use the output in bedtools?
cat file.bed | head
chr1 235638 235751 13.6663
chr1 237748 237784 6.35761
chr1 521484 521614 10.0359
chr1 565575 566082 7.19007
chr1 567523 567873 10.5674
chr1 568176 568545 5.7313
chr1 569748 570042 652.342
chr1 664708 664756 6.32348
shuf file.bed | head
chr3 138552319 138553474 56.8719
chr12 7695465 7695792 11.469
chr20 23312538 23312926 6.68979
chr14 87802700 87802821 6.09238
chr2 180293340 180293591 4.35159
chr18 60279291 60279551 7.28719
chr19 49068267 49068726 34.7679
chr12 60729653 60729899 20.4301
chr2 30458084 30458522 65.6261
chr12 63695225 63695404 4.89757
Maybe this could help you: https://code.google.com/p/bedtools/wiki/Usage#shuffleBed