How to get Normal Samples from TCGA?
0
0
Entering edit mode
4 months ago
Jaber ▴ 30

Greetings BioStars,

I recently came across a study that utilized the TCGA-LIHC dataset in their pipeline, including data from 50 normal liver patients. I’m curious about how they obtained such a large number of normal samples, as I couldn’t find them through GDC or TCGA-Biolinks.

Aside from GEO, are there other reliable sources where normal samples for bulk RNA-seq can be accessed?

I appreciate your help!

Normal TCGA • 1.3k views
ADD COMMENT
0
Entering edit mode

Normal samples may still require approval for access due to privacy concerns, especially if you are looking to get the original sequence data.

ADD REPLY
0
Entering edit mode

Thank you, I just want the transcriptomic data, no need for the raw data

ADD REPLY
0
Entering edit mode

came across a study that utilized the TCGA-LIHC dataset in their pipeline, including data from 50 normal liver patients.

Do you know if they used "normal" data from TCGA or was it from some other source.

ADD REPLY
0
Entering edit mode

You can use TCGA-Biolinks to download the normal samples (final files, not the raw data) from TCGA, I do not know why it is not working for you.

GTeX is another resource to download normal data. It has only the normal samples.

https://gtexportal.org/home/

ADD REPLY
0
Entering edit mode

Thank you so much,

I use this code to access the TCGA-LIHC

clin_liver <- GDCquery_clinic(project = "TCGA-LIHC", type = "clinical")
View(clin_liver)

no normal case I could identify,

I'll try gtexportal

ADD REPLY
0
Entering edit mode
# prepare the sample list

query.exp  <- GDCquery(
    project = "TCGA-LIHS",   # project name 
    data.category = "Transcriptome Profiling", 
    data.type = "Gene Expression Quantification", 
    workflow.type = "STAR - Counts", 
sample.type = c("Solid Tissue Normal") # here you need to give the tissue type
)


# download the list of samples you selected in the previous step; it can take some time

GDCdownload(query.exp)


# prepare the downloaded samples into summarized exp or in DF.

expdat <- GDCprepare(
    query = query.exp,
    save = F # it will not save the files
)

The details can be found here regarding TCGABiolinks:

https://bioconductor.org/packages/devel/bioc/manuals/TCGAbiolinks/man/TCGAbiolinks.pdf

More tutorials of TCGABiolinks can be found here:

https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html

Good luck.

ADD REPLY
0
Entering edit mode

Thank you very much,

really appreciated

I read the TCGAbiolinks but I might overlooked the Solid Tissue Normal

ADD REPLY
0
Entering edit mode

A few facts

  1. There are normal blood DNA-Seq samples in almost all TCGA cases, except a few projects such as LAML
  2. There are a small percentages of solid normal RNA-Seq samples in most TCGA projects
  3. Many of these solid normals are not really normal normals. They are often tumor adjacent normals that means they share some tumor like features.
  4. Many ppl use GTEx normals as unpaired normals for TCGA data analysis. Actually if you look at the GTEx data access applications, more than half of them are about co-analysis with TCGA.
ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6