Yes, you can run an advanced SRA search by qualifying your search with specific fields. Unfortunately, for the less widely-used databases, documentation is lacking and it is difficult to discover what these fields are.
One approach is to use the NCBI EInfo utility. This does require some programmatic skills (but skip the discussion below if you like, the results are at the end). For example, if you install Bioperl, you should find a utility Perl script named bp_einfo
on your machine. You can run:
bp_einfo -d sra
to see the searchable fields in the SRA database. Or just run bp_einfo
for a list of all databases.
If you run Ruby on your machine, here is a small script that uses the Hpricot library to fetch EInfo data for SRA and print out some information:
#!/usr/bin/ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'
doc = Hpricot(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=sra"))
(doc/'//fieldlist/field').each do |f|
name = (f/'/name').inner_html
fullname = (f/'/fullname').inner_html
description = (f/'description').inner_html
puts "#{name}, #{fullname}, #{description}"
end
If you don't have these tools don't worry, here is the output of the Ruby code:
ALL, All Fields, All terms from all searchable fields
UID, UID, Unique number assigned to publication
FILT, Filter, Limits the records
ACCN, Accession, Accession number of sequence
TITL, Title, Words in definition line
PROP, Properties, Classification by source qualifiers and molecule type
WORD, Text Word, Free text associated with record
ORGN, Organism, Scientific and common names of organism, and all higher levels of taxonomy
AUTH, Author, Author(s) of publication
PDAT, Publication Date, Date sequence added to GenBank
MDAT, Modification Date, Date of last update
So, those are the fields for advanced search. For example, to limit the search to humans, you could try Homo sapiens[ORGN]
. If you dig around in the NCBI documentation (or search the web), you should be able to figure out useful ways to use the other fields, such as PROP and FILT.
The European Nucleotide Archive (ENA) and the DDBJ Sequence Read Archive (DRA) also provide access to the SRA data through their services.
thanks, i also have had trouble finding what i want in the short read archive.
This is fantastic, many thanks