Hi,
I have been trying to prepare a SNP file for my NGS analysis. Basically the file should have following information, rsId, chromosome, position, allele, allele freq, counts, population, minor alleles and MAF. And the information should be based on hg 19 (build 37.3) and should have the population information.
HapMap releases (rel 28 & 27) has based on NCBI build 36 and dbSNP b126. And moreover, BioMart-Martview allows only rel 27 retrieval. If i want to use rel 28, ll have to parse the data from ftp site and create one of my own.
1000 Genome project says "The 1000 genomes snp and short indel all get submitted to dbSNP and are available from version 132".
Latest release of dbSNP (135) has 1000 Genome data annotated to it with population information.
So now if i want to have the latest SNP information that i have mentioned above which database should i go for?
If you want the most complete, 1000g phase I, definitely.
Complete genomics is going to release a fair amount publicly available of genomes soon.
Thks, but 1000g has already been integrated with dbSNP 135, but i wasn't sure whether they annotated population info along with SNP information. Currently, i am looking in to their ftp site to see if i could retrieve that. Sure, ll see the Complete Genomics data once it released!
If you are using Illumina Omni2.5 chips there is now the HapMap data available from those chips (may have to contact Illumina directly). You cna always do lift-overs of coordinates from build 36 to 37 if necessary as well.
Thks Dan, Sure, i ll also check with Illumina to see if they have the HapMap data.