Searching Ncbi'S Short Read Archive
2
6
Entering edit mode
14.3 years ago
Wjeck ▴ 490

So I am trying to search NCBI's Short Read Archive and having a hard time of it. Let's say I want sequencing results from a single kind of experiment (let's say CHiP seq) of a certain type (say >76bp Paired end sequencing). Currently I can type "CHiP seq Paired end 76" but the "paired end" and "76" aren't really getting me what I want, especially since I want any and all read sizes >76. Is it possible to run an 'advanced' search on these kinds of fields?

ncbi sra data • 5.5k views
ADD COMMENT
9
Entering edit mode
14.3 years ago
Neilfws 49k

Yes, you can run an advanced SRA search by qualifying your search with specific fields. Unfortunately, for the less widely-used databases, documentation is lacking and it is difficult to discover what these fields are.

One approach is to use the NCBI EInfo utility. This does require some programmatic skills (but skip the discussion below if you like, the results are at the end). For example, if you install Bioperl, you should find a utility Perl script named bp_einfo on your machine. You can run:

bp_einfo -d sra

to see the searchable fields in the SRA database. Or just run bp_einfo for a list of all databases.

If you run Ruby on your machine, here is a small script that uses the Hpricot library to fetch EInfo data for SRA and print out some information:

#!/usr/bin/ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = Hpricot(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=sra"))

(doc/'//fieldlist/field').each do |f|
  name = (f/'/name').inner_html
  fullname = (f/'/fullname').inner_html
  description = (f/'description').inner_html
  puts "#{name}, #{fullname}, #{description}"
end

If you don't have these tools don't worry, here is the output of the Ruby code:

ALL, All Fields, All terms from all searchable fields
UID, UID, Unique number assigned to publication
FILT, Filter, Limits the records
ACCN, Accession, Accession number of sequence
TITL, Title, Words in definition line
PROP, Properties, Classification by source qualifiers and molecule type
WORD, Text Word, Free text associated with record
ORGN, Organism, Scientific and common names of organism, and all higher levels of taxonomy
AUTH, Author, Author(s) of publication
PDAT, Publication Date, Date sequence added to GenBank
MDAT, Modification Date, Date of last update

So, those are the fields for advanced search. For example, to limit the search to humans, you could try Homo sapiens[ORGN]. If you dig around in the NCBI documentation (or search the web), you should be able to figure out useful ways to use the other fields, such as PROP and FILT.

ADD COMMENT
1
Entering edit mode

The European Nucleotide Archive (ENA) and the DDBJ Sequence Read Archive (DRA) also provide access to the SRA data through their services.

ADD REPLY
0
Entering edit mode

thanks, i also have had trouble finding what i want in the short read archive.

ADD REPLY
0
Entering edit mode

This is fantastic, many thanks

ADD REPLY
4
Entering edit mode
14.3 years ago

Have you tried the BioConductor package SRAdb for these type of searches ?

ADD COMMENT

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6