What Are Your Most-Used Public Data Repositories?
7
9
Entering edit mode
11.2 years ago

If you were to catalog public data repositories that house public "omics" and other high-throughput data, what would you include? What are some of the public data repositories to which you have contributed or that you use regularly? In particular, I'd be interested in hearing about repositories or databases of raw omics data that are off-the-beaten-path but that are critical to your research.

Clarification: I am mainly interested in databases that collect and host omics data. I see, for example, that flybase seems to host some modENCODE RNA-seq data.

database • 5.3k views
ADD COMMENT
0
Entering edit mode

use: Sequence Ontology, Gene Ontology, NHLBI exome server, pox.org

ADD REPLY
5
Entering edit mode
11.2 years ago
Dan D 7.4k

Definitely the 1000 genomes project:

http://www.1000genomes.org/data#DataAccess

ADD COMMENT
5
Entering edit mode
11.2 years ago
brentp 24k

By far we use the UCSC genome browser and resources the most. I use the mysql database quite a bit and use the browser to display our data overlaid on all the existing tracks.

http://genome.ucsc.edu/

ADD COMMENT
4
Entering edit mode
11.2 years ago

At the risk of stating the obvious, I most often download data from SRA and ArrayExpress (which also has some NGS data).

GEO is also useful for searching for relevant projects because GEO provides links to the corresponding SRA data.

TCGA is also a commonly used resource, but you typically have to get special permission to access raw data.

ADD COMMENT
1
Entering edit mode

+1 for TCGA. FWIW, TCGA's "special permission" generally just consists of letting them know what you're going to do with the data and filling out a form. They want the data to be easy to get and a community resource, but have to balance that against concerns about the release of clinical data.

ADD REPLY
4
Entering edit mode
11.2 years ago
Mary 11k

UCSC mainly for me too. But I also use the InterMines for the ModENCODE data ( http://modencode.org/ ), and BioMart interface to get to stuff I need that's not at UCSC. That connects me to a lot of sources.

My needs are pretty random--sometimes I'll need a big list of fly gene symbols. And then I'll need some cancer data. Another one I turn to is the International Cancer Genome Consortium: http://icgc.org/

For microbial data I often go to IMG to see what's available. http://img.jgi.doe.gov/

ADD COMMENT
4
Entering edit mode
11.2 years ago
lwc628 ▴ 230

Ensemble(http://useast.ensembl.org/info/data/ftp/index.html). No?

I download all my references and annotations from here

ADD COMMENT
3
Entering edit mode
11.2 years ago
Stephen 2.8k

I use GEO frequently. dbGaP when I have to - access is painful.

ADD COMMENT
1
Entering edit mode
11.2 years ago
zx8754 12k

We use 1000 genomes project, UCSC genome browser tables, TCGA, and we contribute to ICGC Prostate Cancer - http://icgc.org/icgc/cgp/70/508/71331

ADD COMMENT

Login before adding your answer.

Traffic: 1051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6