How to get REF/ALT allele for a list of rsid and SNP notations
0
0
Entering edit mode
22 days ago
Meisam ▴ 250

I got an incomplete list of variants from an old study with only some rsids or coordinations, see example below:

Variant Coordination
rs1585453 11:46884713
10:105087598_AT_A 10:105087598
rs145158522 10:106629859
rs180940 10:115722411
11:47277086_CTTTCTTTT_C 11:47277086

...

Now for my subsequent analysis I need to convert this to a standard VCF format table, with reference and alternate alleles information. I have searched online and have found a few ways I can do this like downloading the dbSNP big reference file which is not currently possible for me, and also using biomaRt it can only bring me back one single "allele" column which has all the possible variations for that SNP and also does not indicate which one is the reference:

snp_mart <- useEnsembl(biomart = "snp", dataset = "hsapiens_snp")

snp_info <- getBM(
   attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end", "chrom_strand", "allele"),
   filters = "snp_filter",
   values = candidate_variant_id,
   mart = snp_mart

output example:

refsnp_id chr_name chrom_start chrom_end chrom_strand allele
rs583104 109278685 109278685 G/A/C/T
rs599839 109279544 109279544 G/C/T

...

It seems like a trivial problem but I have had very hard time to find a simple and efficient way (preferably by doing an online query and not downloading gigabytes of data) to find the REF and ALT allele information for the rsid list, any help is very much appreciated. Thanks!

BED R Variant VCF SNP • 526 views
ADD COMMENT
0
Entering edit mode

This may not be the best solution so I am going to leave it as a comment. But it is at least one solution.

Using EntrezDirect:

$ esearch -db snp -query "rs599839" | efetch -format docsum | xtract -pattern DocumentSummary -element DOCSUM | uniq
HGVS=NC_000001.11:g.109279544G>A,NC_000001.11:g.109279544G>C,NC_000001.11:g.109279544G>T,NC_000001.10:g.109822166G>A,NC_000001.10:g.109822166G>C,NC_000001.10:g.109822166G>T|SEQ=[G/A/C/T]|LEN=1|GENE=PSRC1:84722

$ esearch -db snp -query "rs583104" | efetch -format docsum | xtract -pattern DocumentSummary -element DOCSUM | uniq
HGVS=NC_000001.11:g.109278685G>A,NC_000001.11:g.109278685G>C,NC_000001.11:g.109278685G>T,NC_000001.10:g.109821307G>A,NC_000001.10:g.109821307G>C,NC_000001.10:g.109821307G>T|SEQ=[G/A/C/T]|LEN=1
ADD REPLY

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6