1 10019 10020 TA T rs775809821
1 10020 10020 A - rs775809821
1 10055 10055 - A rs768019142
1 10055 10055 T TA rs768019142
1 10108 10108 C T rs62651026
1 10109 10109 A T rs376007522
Then it's just a matter of picking and associating the rsID from this table. Unix join or merge() in R can do that easily.
I guess you are actually right about that Santosh. Thanks for your suggestion. I'll go for it despite the file being nearly 2GB in size. I'm just surprised Biopython doesn't have some simple function for doing it.
Thanks Emily! Currently hundreds of rsIDs and probably never more than tens of thousands of rsIDs. The solution from Pierre is fast for hundreds of rsIDs. I know how to do it in Python, but that's quite a few lines of code.
library("biomaRt")
snp_mart = useMart(biomart = "ENSEMBL_MART_SNP",
host = "grch37.ensembl.org",
path = "/biomart/martservice",
dataset = "hsapiens_snp")
# list of variables (attributes) that can be retrieved
# listAttributes(mart = snp_mart)
# list of keywords (filters) that you can merge on
# listFilters(mart = snp_mart)
out <- getBM(attributes = c('refsnp_id', 'chr_name', 'chrom_start', 'allele'),
filters = c('snp_filter'),
values = list(df$rsid),
mart = snp_mart)
ADD COMMENT
• link
updated 22 months ago by
Ram
44k
•
written 5.4 years ago by
Maki
▴
10
If one wants to see all of the available columns, then do:
add
-B
to get the output as TSV....