Question

Download SNP Annotation

1

Entering edit mode

2.5 years ago

fernandogs97 ▴ 30

Hello, Actually I am doing a tutorial (https://hakyimlab.github.io/psychencode/generate_weights.html) where they indicate that I need a dbSNP150 reference table containing "chromosome, position, ref, alt, rsid, and dbSNPBuildID" information for the hg19 version of the human genome. I am stuck in this step, can anyone help me and indicate where can I find this information?

SNP dbSNP GWAS NCBI • 1.7k views

ADD COMMENT • link 2.5 years ago by fernandogs97 ▴ 30

score 4 · Accepted Answer · 2022-06-08

4

Entering edit mode

2.5 years ago

Matthias Zepper 5.0k

Since you are working in R, you can often use annotation that is provided as part of Bioconductor. The latest release of Bioconductor for example contains a copy of dbSNP version 150 and even newer ones.

However, the annotation provided by Bioconductor refers to another reference genome build (hg38 instead of the long outdated version hg19). Older (ancient!) versions of Bioconductor might still contain hg19 coordinates, but better use LiftOver to convert the latest annotation back to the older reference genome build.

Alternatively, there is also a hg19 annotation from dbSNP 135 in the current build.

ADD COMMENT • link 2.5 years ago by Matthias Zepper 5.0k

0

Entering edit mode

Thank you very much for all the information, so, as I understood there is not any table available with all this information that I can download?

ADD REPLY • link 2.5 years ago by fernandogs97 ▴ 30

2

Entering edit mode

There is, but why would you want to download the table and manually get it into R when someone else already did the tedious work for you and the annotation is ready as Bioconductor annotation package?

BiocManager::install("SNPlocs.Hsapiens.dbSNP150.GRCh38")

## Get the positions and alleles of all SNPs on chromosomes 1 and X:
snptable = snpsBySeqname(SNPlocs.Hsapiens.dbSNP150.GRCh38, c("1", "X"))

ADD REPLY • link 2.5 years ago by Matthias Zepper 5.0k

0

Entering edit mode

Thanks for the reply, you're right, is a very good approach and for sure I'll use that method in the future. The reason why I need the table is because in the pipeline I am fine tunning parses this type of file in a python based enviorment. Thank you very much again for your kind reply!

ADD REPLY • link 2.5 years ago by fernandogs97 ▴ 30

2

Entering edit mode

I see. I just quickly glanced over the tutorial that you are following and saw that it starts out in R, but failed to notice that it later switches to Python. Sorry.

In that case, the quickest will be to use the UCSC Table Browser. Choose clade: Mammal, genome=Human, assembly=Feb2009/hg19, group: Variation and track: All SNPs (150) or Common SNPs (150) to download dbSNP Version 150 for the hg19 assembly.

Also see this FAQ regarding time-outs and subsets of the data.

ADD REPLY • link 2.5 years ago by Matthias Zepper 5.0k

1

Entering edit mode

Thank you very much for all the information, it has been difficult for me to find that files. Thanks for your kind support, I think that many people will find this information interesting.

ADD REPLY • link 2.5 years ago by fernandogs97 ▴ 30