SNP rs ID batch query
1
0
Entering edit mode
6.2 years ago

Hi everybody:

I have a list of 4000 SNP rs identifiers from dbSNP and I would like to get their orientation (forward/reverse). However, I have not been able to find the way to do it in a straightforward way, as dbSNP no longer supports batch query apparently.

Any help would be really appreciated.

SNP • 6.2k views
ADD COMMENT
1
Entering edit mode

I realized by chance that the bed files for dbSNP contain the strand information.

So one way you can choose, is to first find out on which chromosome your SNPs are located and query than the corresponding bed file for it.

fin swimmer

ADD REPLY
0
Entering edit mode

Aren't all dbSNP variant given on the forward (+) strand of the reference genome regardless of a gene/transcript?

ADD REPLY
0
Entering edit mode

No - some percentage of them are given on the reverse (-), not entirely sure why this is. For example, rs499479

ADD REPLY
2
Entering edit mode
6.2 years ago

Care Lorenzo / Dear Lorenzo,

You can do this in varying ways, one being via biomaRt in R:

require(biomaRt)

ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

getBM(attributes=c(
    "refsnp_id", "chr_name", "chrom_start", "chrom_end", "chrom_strand",
    "allele", "mapweight", "validated", "allele_1", "minor_allele",
    "minor_allele_freq", "minor_allele_count", "clinical_significance",
    "synonym_name", "ensembl_gene_stable_id"),
    filters="snp_filter", values="rs6025",
    mart=ensembl, uniqueRows=TRUE)

refsnp_id chr_name chrom_start  chrom_end   chrom_strand allele mapweight
rs6025    1        169549811    169549811   1            C/T    1

validated
1000Genomes,Cited,ESP,ExAC,Frequency,gnomAD,HapMap,Phenotype_or_Disease,TOPMed

allele_1 minor_allele minor_allele_freq minor_allele_count
C        TRUE         0.00599042        30

clinical_significance
benign,pathogenic,drug response,risk factor

synonym_name ensembl_gene_stable_id
17284        ENSG00000198734

The value of strand will be +1 (plus strand) or -1 (minus strand).

------------------------------------

To look up multiple records at the same time, pass a vector of rs IDs to getBM() as the values parameter, for example:

snps <- c("rs1", "rs2", "rs3", ..., "rsx")

getBM(attributes=c(
        "refsnp_id", "chr_name", ...),
        filters="snp_filter", values=snps,
        mart=ensembl, uniqueRows=TRUE)

Regarding strand orientation in dbSNP, you may read the FAQ, starting from: Strand Orientation.

Kevin

ADD COMMENT
0
Entering edit mode

Dear Kevin, thank you so much for your through answer, it was really useful. However, I am afraid it does not work as I expected since when I use biomaRt with rs36563, which I know for certain is in REVERSE (https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=36563) I still get a +1 value in the strand field:

*require(biomaRt)
ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
getBM(attributes=c(
    "refsnp_id", "chr_name", "chrom_start", "chrom_end", "chrom_strand",
    "allele", "mapweight", "validated", "allele_1", "minor_allele",
    "minor_allele_freq", "minor_allele_count", "clinical_significance",
    "synonym_name", "ensembl_gene_stable_id"),
    filters="snp_filter", values="rs36563",
    mart=ensembl, uniqueRows=TRUE)

refsnp_id chr_name chrom_start  chrom_end   chrom_strand allele mapweight

rs36563    14        70885931    70885931   1            T/G    1*

This is the same value that I obtain with FORWARD SNPs (such as rs6902771), so I still cannot differentiate them. Any ideas how I can do so?

Thank you so much again!

ADD REPLY
0
Entering edit mode

Hello,

could you please explain why it is important to know whether a variant is reported on the reverse or forward strand in dbSNP?

fin swimmer

ADD REPLY
0
Entering edit mode

Hello Fin,

Coming back to the rs36563 example, the reported risk allele for a particular trait is A (https://www.ebi.ac.uk/gwas/search?query=rs36563), however, when I annotate such SNP using biomaRt (as I put above) I find that the alleles are T/G, and that is because the SNP is on the reverse strand.

That is why I would like to know the strand of a batch o variants, so I can consider the complementary nucleotide in those cases where the SNP is on the reverse and therefore see if it matches with the risk allele.

Hope that makes it clearer.

Thank you very much!

ADD REPLY
1
Entering edit mode

Variants do not have strand, which is why Ensembl adopt a default behaviour of always showing the forward strand alleles for all variants. The risk allele reported in papers is entirely dependant on what the paper authors decide to do, which may be the forward strand, may follow the strand of the alleles quoted by dbSNP, or may not.

ADD REPLY
0
Entering edit mode

Hi, I tried this method and I'm getting the following error message, can you please suggest how I can address this issue? Thanks in advnace

Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached: [www.ensembl.org:443] Operation timed out after 300004 milliseconds with 309191 bytes received

ADD REPLY
0
Entering edit mode

Are you overloading Ensembl's server?

ADD REPLY
0
Entering edit mode

Please create a new question with full information of your BioMart query

ADD REPLY

Login before adding your answer.

Traffic: 1391 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6