Hello,
Actually I am doing a tutorial (https://hakyimlab.github.io/psychencode/generate_weights.html) where they indicate that I need a dbSNP150 reference table containing "chromosome, position, ref, alt, rsid, and dbSNPBuildID" information for the hg19 version of the human genome. I am stuck in this step, can anyone help me and indicate where can I find this information?
Since you are working in R, you can often use annotation that is provided as part of Bioconductor. The latest release of Bioconductor for example contains a copy of dbSNP version 150 and even newer ones.
However, the annotation provided by Bioconductor refers to another reference genome build (hg38 instead of the long outdated version hg19). Older (ancient!) versions of Bioconductor might still contain hg19 coordinates, but better use LiftOver to convert the latest annotation back to the older reference genome build.
There is, but why would you want to download the table and manually get it into R when someone else already did the tedious work for you and the annotation is ready as Bioconductor annotation package?
BiocManager::install("SNPlocs.Hsapiens.dbSNP150.GRCh38")
## Get the positions and alleles of all SNPs on chromosomes 1 and X:
snptable = snpsBySeqname(SNPlocs.Hsapiens.dbSNP150.GRCh38, c("1", "X"))
Thanks for the reply, you're right, is a very good approach and for sure I'll use that method in the future. The reason why I need the table is because in the pipeline I am fine tunning parses this type of file in a python based enviorment.
Thank you very much again for your kind reply!
I see. I just quickly glanced over the tutorial that you are following and saw that it starts out in R, but failed to notice that it later switches to Python. Sorry.
In that case, the quickest will be to use the UCSC Table Browser. Choose clade: Mammal, genome=Human, assembly=Feb2009/hg19, group: Variation and track: All SNPs (150) or Common SNPs (150) to download dbSNP Version 150 for the hg19 assembly.
Thank you very much for all the information, it has been difficult for me to find that files. Thanks for your kind support, I think that many people will find this information interesting.
Thank you very much for all the information, so, as I understood there is not any table available with all this information that I can download?
There is, but why would you want to download the table and manually get it into R when someone else already did the tedious work for you and the annotation is ready as Bioconductor annotation package?
Thanks for the reply, you're right, is a very good approach and for sure I'll use that method in the future. The reason why I need the table is because in the pipeline I am fine tunning parses this type of file in a python based enviorment. Thank you very much again for your kind reply!
I see. I just quickly glanced over the tutorial that you are following and saw that it starts out in R, but failed to notice that it later switches to Python. Sorry.
In that case, the quickest will be to use the UCSC Table Browser. Choose clade: Mammal, genome=Human, assembly=Feb2009/hg19, group: Variation and track: All SNPs (150) or Common SNPs (150) to download dbSNP Version 150 for the hg19 assembly.
Also see this FAQ regarding time-outs and subsets of the data.
Thank you very much for all the information, it has been difficult for me to find that files. Thanks for your kind support, I think that many people will find this information interesting.