Rna-Seq Data In Public Database
4
4
Entering edit mode
12.2 years ago
camelbbs ▴ 710

I just want to ask if there is a database like GEO storing microarray data, that stores RNA-seq data and is public.

I know SRA, but the source is not enough. Any other better one? I want to search for some special cell lines that have been sequenced by RNA-seq. Where I can find them? Thanks!!!

rna-seq • 26k views
ADD COMMENT
0
Entering edit mode

InSilico DB has just released a beta integration with Ingenuity iReport. You can export public RnaSeq data from GEO/SRA and get a free iReport preview.

ADD REPLY
6
Entering edit mode
12.2 years ago

Ironically, the answer to your question might be GEO. Other than SRA its the largest collection of RNAseq data that I know of. I'm not sure what platform you are looking for or what species your cell line is for. But, lets assume you want Illumina RNAseq data for human lines. You might start by searching GEO platforms for "Illumina homo sapiens". This identifies 8 platforms, three of which have substantial numbers of samples submitted to GEO:

  • GPL9115: Illumina Genome Analyzer II (Homo sapiens) = 3466 samples
  • GPL10999: Illumina Genome Analyzer IIx (Homo sapiens) = 2274 samples
  • GPL11154: Illumina HiSeq 2000 (Homo sapiens) = 1695 samples

You can then search for one of these platforms plus the name of your cell line of interest and hope you get lucky. An example query might look like this

Another option is to search for records where the Platform Technology Type = "high-throughput sequencing".

NOTE: GEO seems to still define "platforms" in the next-gen-sequence space quite crudely by simply the sequencer and not the type of sequencing done. A GEO platform of GPL96 (Affymetrix U133A) would definitely indicate an RNA expression dataset with clearly defined parameters. But, the platform of GPL9115 might (and does) indicate any of RNA-seq, ChIP-seq, miRNA sequencing, ChIA-PET, DamIP-seq, bisulfite sequencing, etc. To say nothing of differences in read length, paired vs single-end, polyA selection method, etc. So read carefully before proceeding with any dataset.

Finally, if you know for a fact that your special cell line has been RNA-seq'd but can't find it in SRA or GEO you may have to contact the authors (if the study has been published). Many NGS studies are still not being made available. But, they should be...

ADD COMMENT
0
Entering edit mode

thanks a lot ..............

ADD REPLY
0
Entering edit mode

BTW, another thing. Does GEO include all the SRA info (except the data) ? When I check SRA, I found they have GEO query.

ADD REPLY
0
Entering edit mode

I'm not sure. But, I suspect you will find a whole variety of situations where sometimes data is in SRA and linked from GEO or vice versa and other times data has just been submitted to one or the other (or neither).

ADD REPLY
3
Entering edit mode
12.2 years ago
Markus Krupp ▴ 100

...just a little addition to Obi's reply.

You can use the GEO advanced search interface: http://www.ncbi.nlm.nih.gov/gds/advanced/ ...here you can choose between several fields and also an option to list the corresponding indices.

e.g. choosing the field "Platform Technology Type" and clicking "show index list" will end up with 26597 entries when selecting "high throughput sequencing" index.

...use combinations of those options within the advanced search interface and you will end up with a good repertoire of RNA-seq data.

ADD COMMENT
0
Entering edit mode

Thanks, that's helpful!

ADD REPLY
2
Entering edit mode
12.2 years ago

Running a large scale data distribution service is an expensive operation. Since paying for a data download is not something people would do it shouldn't come as a surprise that there aren't that many options to choose from.

Beside SRA the only large scale data source that comes to mind are the Encode data downloads.

ADD COMMENT
0
Entering edit mode

Thanks. I will SRA maybe the one.

ADD REPLY
2
Entering edit mode
11.4 years ago
alaincoletta ▴ 170

Check InSilico DB (https://insilicodb.org): 100,000s of manually curated profiles pre-processed and ready to analyse freely available. RNA-Seq data is pre-processed with tophat-cufflinks-cuffdiff-cummeRbund. and ready to analyse. See https://insilicodb.org/differential-gene-expression-heatmap-from-rnaseq-data-using-cummerbund/ for a step-by-step example. The data comes from GEO and SRA, but it's been curated and pre-processed.

Highly accessed Genome Biology paper: http://genomebiology.com/2012/13/11/R104

ADD COMMENT

Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6