Hello, I have a GWAS summary statistics file where some of the rows are not rsID but chrID. I want to use bioMart package in R in order to find all the rsIDs for my 232 chrIDs in this format:
chr1_11105545 1 11105545 T C 0.965395 0.72017 -7.40832e-03 0.00460502 0.113547560
In bold the columns corresponding to chromosome (CHR) and position (POS). I know and have tested that it works if I use "values = list(chr_number, pos_number start, pos_number end)", but given that the size is 232, I am trying to write a loop to get all the list done.
I have done this but when I run it I get the error:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [grch37.ensembl.org:80] Operation timed out after 300001 milliseconds with 15732557 bytes received
Which I think it is due to the fact that I'm somehow making an infinite loop or something of the likes of it? I've been at it for three days now though and I am a bit confused so I'd appreciate if someone helped me find the error or hinted me in the right direction!
#Extract ch_# rows in R dataframe (yields 232 hits)
SSNP_M_Ch <- subset(SSNP_M, grepl("chr\\d_", SNP))
snp = useMart("ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_snp")
results<-c()
for (i in 1:dim(SSNP_M_Ch)[1]){
temp <- getBM(attributes = c('refsnp_id','allele','chrom_start','chrom_strand'),
filters = c('chr_name','start','end'),
values = list(SSNP_M_Ch$CHR,SSNP_M_Ch$POS,SSNP_M_Ch$POS), mart = snp)
temp[] <- i
results <- rbind(results,temp)
}
Thanks!
At least, you can easily check whether there is some progress adding 'print(head(temp))' after getBM query. Hovewer, in such a loop I would expect some subsetting like SSNP_M_Ch$CHR[i] aimed to feed getBM with a single current value. Presented code looks like whole CHR and POS columns of SSNP_M_Ch are passed to getBM.
Also, as far as I can understand, 'temp[] <- i' fills a dataframe after getBM query with a current index value - is it really something it meant to do?
How about this:
By printing each current request, you'll be able to find if some id's cause troubles in getBM.
BTW, I do not know whether the list that you passed to getBW as 'values' argument is correct - repeating POS values twice looks counterintuitive to me, but I did not work with biomartr and can't advice on this matter.
Thanks! I got it sorted in the end, I didn't realize I was passing the whole column instead of going row by row... I ended up doing SSNP_M_Ch$CHR[i,2], SSNP_M_Ch$POS[i,3] and just added the index value as another column in temp and that worked smoothly.