Where can I find a good RNA-seq dataset on cancer?
1
3
Entering edit mode
4.8 years ago

enter image description here

Often times, it is very hard to find the right omics data for your precision oncology research project. Learning about the impact of next-generation sequencing and the explosive growth of publically available data, one might just wonder where the RNA-seq dataset on cancer is and how easy is it to find what you are looking for. 

Interacting with many students during our OmicsLogic educational programs, we realized the need for high-quality data sources that anyone can learn about and use. Good data is data from collections that follow a certain level of meta-data annotation with minimal restrictions and easy access to all the files. For example, detail of phenotypic information associated with samples as well as file size or sequencing instruments being used. Another criterion is the number of replicates, whether they are technical or biological - best repositories contain many samples per stud.

We compiled a small list of resources where you can find RNA-seq data to start your oncology bioinformatics project:

1. Elixir’s Expression Atlas

enter image description here

2. NCBI – National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/bioproject)

While it is not the easiest place to find a dataset you are interested in, once you learn to navigate the NCBI site, you can find a lot of good datasets. A BioProject is a collection of biological data related to a single initiative, originating from a single organization or a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. As you search, you can narrow down the results to include RNA-seq, type of cancer and organisms that you want to be included in your results:

enter image description here

3. TCGA – The Cancer Genome Atlas

Finally, we cannot ignore the Cancer Genome Atlas – a huge repository of data that can be very useful for a variety of reasons.

enter image description here

Did we forget to mention a major resource for the RNA-seq database for cancer you like? Let us know by posting a comment below!

You can also explore the same post with more information here and find out similar posts here.

Mohit Mazumder,
Ph.D.Machine learning & Computational Biology
Pine Biotech, Inc.
USA

cancer • 1.8k views
ADD COMMENT
0
Entering edit mode
17 months ago
Zhenyu Zhang ★ 1.2k

Go to GDC, there are current 19k cases with RNA-Seq data (11k from TCGA).

ADD COMMENT
0
Entering edit mode

It would be nice to have a cancer resource where all the RNAseq (and other -seq data) data are easy to obtain in raw FASTQ format. GDC requires going through dbGAP (which is very difficult for independent student researchers to get access to).

ADD REPLY
0
Entering edit mode

Human genomic sequencing data is treated as Personal Identifier Information (PII) from US regulations. If going through dbGaP is difficult to you, other ways will be more complicated.

ADD REPLY
0
Entering edit mode

True, but a cancer dataset doesn't necessarily need to be patient clinical samples. And there are plenty of public raw RNAseq patient cohort samples (de-identified) on SRA, e.g. https://www.nature.com/articles/s41467-019-09374-9

Would be useful if a list of such datasets could be curated.

That said, TCGA is the most comprehensive cancer dataset/resource if one can go through dbGAP (which I'll attempt to get PI approval from soon).

ADD REPLY
1
Entering edit mode

There are some legacy data that are around, like 1kg data (germline) and CCLE data (cancer cell line). New sequencing projects normally requires patient consent to put their sequence reads in open access. Btw, for CCLE, I believe some old cell lines are open somewhere; the Broad Institute did some new sequencing and I am not sure if the new ones are still fully open.

ADD REPLY
0
Entering edit mode

After trying a few things, this patient dataset - https://www.nature.com/articles/sdata201610 - was the easiest to get (can obtain via direct download without sending emails/applications/etc.).

It's one of the only ones like that.

ADD REPLY

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6