Download all cases from TCGAbiolinks
1
0
Entering edit mode
4.1 years ago
aksam ▴ 10

Hi all, I would like to download the bulk RNA-seq data for all patients in the TCGA-LUAD cohort using TCGAbiolinks. Does this exist as a single matrix?

I have read the package vignette and can download individual cases however does TCGAbiolinks facilitate downloading a single matrix of all the patients?

I ask because if you download similar data from Xena browser you can download a 585 column matrix.

I tried this with TCGAbiolinks:

test<-GDCquery(project = 'TCGA-LUAD', data.category = 'Gene expression', data.type = 'Gene expression quantification', platform = "Illumina HiSeq", file.type='results', legacy = TRUE)
dim(getResults(test))

This results in 600 files.

I tried the code below to see if one file was much bigger than the others but it appears not, hence all 600 files are separate cases:

getResults(test) %>% arrange(desc(file_size)) %>% head(10)

Finally I interrogated the duplicated cases and while some cases have a file for both cancer and normal tissue (this is OK), other patients have 2 or 3 files all for cancer tissue. Which file should I choose?!

dups_index <- which(duplicated(getResults(test)[,"cases.submitter_id"]))
dups <- getResults(test)[,"cases.submitter_id"][dups_index]

for(i in 1:length(dups)){
    print(i)
    print(getResults(test) %>% filter(cases.submitter_id == dups[i]) %>% select(sample_type))
}

Any help appreciated, thanks in advance

RNA-Seq R • 3.7k views
ADD COMMENT
0
Entering edit mode

Thanks, I managed to download the whole matrix using this. There are still duplicated entries (e.g. more than two tumour samples for the same patient) with no obvious rationale for which to delete, but at least I have the whole matrix now - thanks

(apologies this should be a reply to the answer above but can't seem to get this to work)

ADD REPLY
0
Entering edit mode

Are you not able to use ADD COMMENT button?

ADD REPLY
0
Entering edit mode

Hi, yes seems to be working now - thanks

ADD REPLY
2
Entering edit mode
4.1 years ago

Yes, it would provide you a matrix. try this:

library("TCGAbiolinks") # bioconductor package
query_TCGA = GDCquery(
  project = "TCGA-LUAD",
  data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
  experimental.strategy = "RNA-Seq",
  workflow.type = "HTSeq - Counts")

GDCdownload(query = query_TCGA)

dat <- GDCprepare(query = query_TCGA, save = TRUE, save.filename = "exp.rda")


# exp matrix
rna <- as.data.frame(SummarizedExperiment::assay(dat))
ADD COMMENT
0
Entering edit mode

I use this code to download data but i get unicode issue like in pictures here unicode issue

How can i solve this problem?

ADD REPLY
0
Entering edit mode

Please paste your code here and let other people see what you have tried. So they might be able to help with issue.

ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6