I got an incomplete list of variants from an old study with only some rsids or coordinations, see example below:
Variant | Coordination |
---|---|
rs1585453 | 11:46884713 |
10:105087598_AT_A | 10:105087598 |
rs145158522 | 10:106629859 |
rs180940 | 10:115722411 |
11:47277086_CTTTCTTTT_C | 11:47277086 |
...
Now for my subsequent analysis I need to convert this to a standard VCF format table, with reference and alternate alleles information. I have searched online and have found a few ways I can do this like downloading the dbSNP big reference file which is not currently possible for me, and also using biomaRt
it can only bring me back one single "allele" column which has all the possible variations for that SNP and also does not indicate which one is the reference:
snp_mart <- useEnsembl(biomart = "snp", dataset = "hsapiens_snp")
snp_info <- getBM(
attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end", "chrom_strand", "allele"),
filters = "snp_filter",
values = candidate_variant_id,
mart = snp_mart
output example:
refsnp_id | chr_name | chrom_start | chrom_end | chrom_strand | allele |
---|---|---|---|---|---|
rs583104 | 109278685 | 109278685 | G/A/C/T | ||
rs599839 | 109279544 | 109279544 | G/C/T |
...
It seems like a trivial problem but I have had very hard time to find a simple and efficient way (preferably by doing an online query and not downloading gigabytes of data) to find the REF and ALT allele information for the rsid list, any help is very much appreciated. Thanks!
This may not be the best solution so I am going to leave it as a comment. But it is at least one solution.
Using EntrezDirect: