Question

Download all cases from TCGAbiolinks

0

Entering edit mode

4.1 years ago

aksam ▴ 10

Hi all, I would like to download the bulk RNA-seq data for all patients in the TCGA-LUAD cohort using TCGAbiolinks. Does this exist as a single matrix?

I have read the package vignette and can download individual cases however does TCGAbiolinks facilitate downloading a single matrix of all the patients?

I ask because if you download similar data from Xena browser you can download a 585 column matrix.

I tried this with TCGAbiolinks:

test<-GDCquery(project = 'TCGA-LUAD', data.category = 'Gene expression', data.type = 'Gene expression quantification', platform = "Illumina HiSeq", file.type='results', legacy = TRUE)
dim(getResults(test))

This results in 600 files.

I tried the code below to see if one file was much bigger than the others but it appears not, hence all 600 files are separate cases:

getResults(test) %>% arrange(desc(file_size)) %>% head(10)

Finally I interrogated the duplicated cases and while some cases have a file for both cancer and normal tissue (this is OK), other patients have 2 or 3 files all for cancer tissue. Which file should I choose?!

dups_index <- which(duplicated(getResults(test)[,"cases.submitter_id"]))
dups <- getResults(test)[,"cases.submitter_id"][dups_index]

for(i in 1:length(dups)){
    print(i)
    print(getResults(test) %>% filter(cases.submitter_id == dups[i]) %>% select(sample_type))
}

Any help appreciated, thanks in advance

RNA-Seq R • 3.7k views

ADD COMMENT • link updated 3.1 years ago by Hamid Ghaedi 3.3k • written 4.1 years ago by aksam ▴ 10

0

Entering edit mode

Thanks, I managed to download the whole matrix using this. There are still duplicated entries (e.g. more than two tumour samples for the same patient) with no obvious rationale for which to delete, but at least I have the whole matrix now - thanks

(apologies this should be a reply to the answer above but can't seem to get this to work)

ADD REPLY • link 4.1 years ago by aksam ▴ 10

0

Entering edit mode

Are you not able to use ADD COMMENT button?

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Hi, yes seems to be working now - thanks

ADD REPLY • link 4.1 years ago by aksam ▴ 10

score 2 · Answer 1 · 2020-10-16

2

Entering edit mode

4.1 years ago

Hamid Ghaedi 3.3k

Yes, it would provide you a matrix. try this:

library("TCGAbiolinks") # bioconductor package
query_TCGA = GDCquery(
  project = "TCGA-LUAD",
  data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
  experimental.strategy = "RNA-Seq",
  workflow.type = "HTSeq - Counts")

GDCdownload(query = query_TCGA)

dat <- GDCprepare(query = query_TCGA, save = TRUE, save.filename = "exp.rda")


# exp matrix
rna <- as.data.frame(SummarizedExperiment::assay(dat))

ADD COMMENT • link 4.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

I use this code to download data but i get unicode issue like in pictures here

How can i solve this problem?

ADD REPLY • link 3.1 years ago by Amani • 0

0

Entering edit mode

Please paste your code here and let other people see what you have tried. So they might be able to help with issue.

ADD REPLY • link 3.1 years ago by Hamid Ghaedi 3.3k