Hi there,
I always used TCGAbiolinks to get raw count for TCGA projects like below:
expquery <- GDCquery(project = "TCGA-KIRC",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
GDCdownload(expquery,directory = "GDCdata")
expquery2 <- GDCprepare(expquery,directory = "GDCdata",summarizedExperiment = T)
expMatrix <- TCGAanalyze_Preprocessing(expquery2)
However, it does not work today and it seems there is no HTSeq - Counts
Error in GDCquery(project = "TCGA-KIRC", data.category = "Transcriptome Profiling", :
Please set a valid workflow.type argument from the list below:
=> STAR - Counts
Therefore, I used STAR - Counts to download the data, which has completely different format of what I downloaded before using HTSeq - Count. The expression matrix for each sample has more columns including fpkm_unstranded and tpm_unstranded.
Actually the data downloaded using STAR - Counts is much more useful but I do not know how to extract the files to a readable expression matrix (ENSEMBL ID as rownames and TCGA tumor barcode as column names) because the GDCprepare() function fails to work on it:
expquery2 <- GDCprepare(expquery,directory = "GDCdata",summarizedExperiment = T)
| | 0%
Error in readr::read_tsv(file = f, col_names = TRUE, progress = FALSE, :
unused argument (show_col_types = FALSE)
Error in if (value == n) { : argument is of length zero
Anyone has any solutions? Many thanks in advance!
I'm stuck too
lol, let's wait and see if any hero can save us!