Download full list of SNPs and their coordinates in hg38
3
3
Entering edit mode
6.7 years ago
gaelgarcia ▴ 270

What is the best / standard place to get a full list of SNPs and their coordinates in hg38?

I downloaded the SNPsnap database, but just realized that those coordinates are in hg19.

I'm trying to figure out how many SNP sites exist in my targeted genome sequencing data.

Many thanks.

SNP hg38 SNPsnap DBsnp population genetics • 23k views
ADD COMMENT
5
Entering edit mode
6.7 years ago

One can download it in many formats by first going here and then choosing the dbSNP build version and the human genome reference build:

For example, human_9606_b151_GRCh38p7 is dbSNP release version 151 with co-ordinates for GRCh38.p7.

The VCF format is a common download. In the VCF directory, the 00-All.vcf.gz file is the one that contains all records. Take a look at the READMEs in order to see what's in all of the other files.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you for your clear response, Kevin. I see that I can download the list in BED format, but there doesn't appear to be a file with all chromosomes; instead, there is one file per chromosome. Is there a reason why one can't download the full list in BED format?

ADD REPLY
3
Entering edit mode

@OP: Download all 00-All.vcf.gz (with all the variants), then convert vcf to bed using vcf2bed.

ADD REPLY
0
Entering edit mode

Good idea, @cpad0112. Enjoy your bank holiday Monday.

ADD REPLY
0
Entering edit mode

you too @kevin

ADD REPLY
1
Entering edit mode

Hello gaelgarcia,

having one file per chromomse have the advantage that you only need to download one smaller file of you investigate only a specific region.

If you need all informations in one file, you can concatenate the files to one after downloading.

fin swimmer

ADD REPLY
1
Entering edit mode

Yes, as per fin swimmer. Large datasets are typically made available on a per chromosome basis. The VCF version of dbSNP should contain all variants across all chromosomes, though (but it's a very large file > 10GB).

ADD REPLY
0
Entering edit mode

Great - thanks again.

ADD REPLY
3
Entering edit mode
5.5 years ago
Shicheng Guo ★ 9.6k

dbSNP153.hg19.vcf

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.bgz -O ~/hpc/db/hg19/dbSNP153.hg19.vcf.bgz
tabix -p vcf dbSNP153.hg19.vcf.bgz
zcat dbSNP153.hg19.vcf.bgz > dbSNP153.hg19.vcf

dbSNP153.hg38.vcf

wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.bgz -O ~/hpc/db/hg38/dbSNP153.hg38.vcf.bgz
tabix -p vcf dbSNP153.hg38.vcf.bgz
zcat dbSNP153.hg19.vcf.bgz > dbSNP153.hg38.vcf.bgz
ADD COMMENT
1
Entering edit mode

Hi! May I know if this build 152 is the latest dbsnp release? Also, I noticed in the build 152 file you shared, the first column contains Refseq accession numbers (NC/NT), instead of chromosome no as in build 151. Do you know how can I convert the file?

ADD REPLY
2
Entering edit mode

I don't think dbSNP152 is the latest version. I think dbSNP153 is the lasted version. However, the above command will give you latest dbSNPs no matter it is 152 or 153. Maybe I should change 152 to 153 in my above answer.

check these two files and they will help you transfer NC/NT** to chr1, chr2

https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.25_GRCh37.p13_assembly_report.txt

https://raw.githubusercontent.com/Shicheng-Guo/AnnotationDatabase/master/GCF_000001405.38_GRCh38.p12_assembly_report.txt
ADD REPLY
0
Entering edit mode

Hi Shicheng Guo

Do you know if such simply change on chrom names to equivalent ones (using this table you shared above) could be done, without major problems?

ADD REPLY
0
Entering edit mode
ADD REPLY
3
Entering edit mode
2.9 years ago

The accepted answer has a link to the URL which was not updated since build 151. The current build (155 as of the day of writing this answer) is available in https://ftp.ncbi.nlm.nih.gov/snp/latest_release/ (in VCF and compressed JSON formats). For VCF you want to use GCF_000001405.25.gz for GRCh37 and GCF_000001405.39.gz (or newer) for GRCh38. Builds after 151 are archived in https://ftp.ncbi.nlm.nih.gov/snp/archive/

ADD COMMENT
1
Entering edit mode

Correction: the GRCh38 file is GCF_000001405.39.gz.

ADD REPLY
0
Entering edit mode

Thanks, that was a copy-paste error, I edited the answer.

ADD REPLY
0
Entering edit mode

Thank you for posting

ADD REPLY

Login before adding your answer.

Traffic: 1247 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6