Hi - thanks to some help on here I am getting used to querying uniprot. A question I have is about how to use both the "random" and "limit" functionalities in the same query.
For example, I have:
which I am trying to get some transmembrane proteins, randomize the order in which they appear, and then choose the first 10. I would expect to see different(!) proteins each time I run this query if I am using the random flag. however, I obtain the same 10 proteins each time. it seems the random flag is being ignored. maybe this isn't what it's used for an I have it wrong.
can I use the random and limit flags together in such a way?
EDIT
From this thread and using Elisabeth's answer I have used the uniprot query and wrapped in a little R script. the result is similar to Pierre's answer in that thread, however my campus firewall doesn't allow me to connect via mysql. Here's the script:
library(XML)
library(httr)
suppressPackageStartupMessages(library("methods"))
search.term="reviewed:yes+AND+organism:9606+AND+annotation:(type:transmem)&random=yes"
for (i in 1:10){
url.name=paste0("http://www.uniprot.org/uniprot/?query=",search.term)
url.get=GET(url.name)
url.content=content(url.get, as="text")
links <- xpathSApply(htmlParse(url.content), "//a[contains(@href, 'fasta')]",xmlGetAttr, "href")
fasta_link<-paste0("http://www.uniprot.org",links[1])
download.file(fasta_link,"myseqs.fasta",quiet= FALSE,mode="a")
}
This downloads 10 transmembrane sequences chosen at random. haven't quite worked out how to do this without replacement yet, but will update when I have. download.file "mode" has been set to append (a flag) as I wanted to collect all sequences into one file.
Cheers.
Thanks elizabeth, that would be a really useful. in the mean time, I have used your answer from Is it possible to download a random set of proteins? (fasta files) and made a little R script which will grab the fasta file from a random page. can loop through the uniprot query as many times as you like