Entering edit mode
16 months ago
applepie
▴
10
Hello Everyone, recently I have been conducting the survival analysis of liver cancer patients expressing CD47, using the TCGA data. The project in this case is "TCGA-LIHC". After downloading the gene expression data of 20 primary tumor sample, I tried to get the counts from the downloaded data. However I have encountered some weird problem:
The R console gave me this output:
One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)
I don't know what's happened... Do anyone know such problem ?
The code I used to generate the data is as follows:
# get gene expression data -----------
# build a query to get gene expression data for entire cohort
query_liver_all = GDCquery(
project = "TCGA-LIHC",
data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
experimental.strategy = "RNA-Seq",
workflow.type = "STAR - Counts",
data.type = "Gene Expression Quantification",
sample.type = "Primary Tumor",
access = "open")
output_liver <- getResults(query_liver_all)
# get 20 primary tissue sample barcodes
tumor <- output_liver$cases[1:20]
# OR
tumor <- output_liver[output_liver$sample_type == "Primary Tumor", "cases"][1:20]
tumor
# # get gene expression data from 20 primary tumors
query_liver <- GDCquery(
project = "TCGA-LIHC",
data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
experimental.strategy = "RNA-Seq",
workflow.type = "STAR - Counts",
data.type = "Gene Expression Quantification",
sample.type = c("Primary Tumor", "Solid Tissue Normal"),
access = "open",
barcode = tumor)
# download data
GDCdownload(query_liver)
# get counts
tcga_liver_data <- GDCprepare(query_liver, summarizedExperiment = TRUE)