Hi,
I have a list of SNPs (15k) and I want to know which of these SNPs are present in pubmed.
Is there any software or package to do this?
Thank you
Hi,
I have a list of SNPs (15k) and I want to know which of these SNPs are present in pubmed.
Is there any software or package to do this?
Thank you
I wasn't going to add any thoughts to this because it is straightforward to do what Pierre suggests. But I have decided to add some comments (and a couple of ideas) to supplement that. This may be beyond what you need--feel free to ignore this.
You are going to get a small fraction of what might be available in abstracts. PubMedCentral (PMC) will offer more access of course. But some will still be locked up in publisher silos. And even beyond that--the way publication is done today, some of the SNPs will be in supplements--or even not mentioned by name but submitted as a set with only a reference to a giant submission. That said--those individual SNPs probably also don't have much useful biology attached to them, which is probably the point of this.
There are at least 2 additional things I'd do:
Check out the new Publications track at UCSC. Although I didn't highlight all the details in my tip, one of the aspects is that SNPs in the Elsevier papers and in PMC have been extracted and applied to the UCSC data for some species (so if your species isn't there, sorry). I haven't tried it--but I assume like other tracks you could list your SNPs and see if they match any of these papers. And even if you don't have full access to the papers at the publishing site, you'll get a bit of text for context from the page and you can see if it's worth looking further. For example: http://genome.ucsc.edu/cgi-bin/hgc?hgsid=279670783&c=chr4&o=110901197&t=110901198&g=pubsMarkerSnp&i=rs2237051
In this study, four SNPs, including rs2237051, rs2250724, rs6533482, and rs2237043 were selected from....
I really liked the idea of this GRAIL tool, because it looked for your SNP and then also looked for nearby data via text mining. That said, I'm not sure how current the maintenance is. But still--I would look.
And I found a bug on this comment, reporting to Istvan... http://screencast.com/t/1Uyjvn3t
I try your suggestions:
1: I like the new track. It will be useful when I will work with a specific region. But in my case, my SNPs are distributed on the genome.
2: I try GRAIL. I was unable to make it work with my SNPs. It seems OUTdated because the HapMap release is 22.
Thanks for your help.
It doesn't matter where they are--you can create a table browser query with the SNP ids and do all of them. If you haven't tried that you can see how in our tutorial on the Table Browser: http://www.openhelix.com/ucscadv
You could use NCBI ELink: http://www.ncbi.nlm.nih.gov/books/NBK25499/
e.g.:for rs2279744 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=snp&db=pubmed&id=2279744
http://www.SNPedia.com is built to do most of what you need. This perl code will download all the rs#s from there:
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
protocol => 'http',
host => 'bots.snpedia.com',
path => '/',
});
$bot->{api}->{use_http_get} = 1;
my @rsnums = $bot->get_pages_in_category('Category:Is_a_snp', {max=>0});
print join("\n",@rsnums),"\n";
Redirect that to a file and intersect it with your list of 15k. There are a few in here which exist in omim, but not PubMed but you're probably ok with that. If not the examples at http://snpedia.com/index.php/Bulk should allow you to extract the page text and keep only the ones which include the text 'PMID'
I try your method. I wasn't able to make the bot work. I don't know perl language. But I found this on SNPedia: http://kokki.uku.fi/bioinformatics/varietas/. Very useful, but not exactly what I need.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Your solution is the simpler and it's working. Thanks.