Question

How to get fpkm of TCGA data

0

Entering edit mode

2.0 years ago

Maryam • 0

Hello everyone, would you mind helping me. I am using TCGAbiolinks to get TCGA data, I want to get fpkm instead of count, on the other hand I have to use STAR-counts instead of HTseq-counts since TCGA has been updated.

library(TCGAbiolinks)

stadquery <- GDCquery(project = "TCGA-STAD", 
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification",
                      workflow.type = "STAR - Counts", legacy = F,
                      experimental.strategy = "RNA-Seq") 


GDCdownload(query = stadquery, method = "api",)                        


stadprpr <- GDCprepare(query = stadquery, summarizedExperiment = T)

but when I use Exdata <-stadprpr@assays@data$fpkm_uq_unstrand, the matrix doen't contain the colnames(samples) and rownames(genes). How can I fix it? Thanks in advance.

FPKM RNA-seq TCGAbiolinks • 1.6k views

ADD COMMENT • link 2.0 years ago by Maryam • 0

score 0 · Answer 1 · 2022-11-11

That is because you use a custom and not recommended way of accessing the data. It is a SummarizedExperiment and for this you should use the dedicated getter function assay:

library(SummarizedExperiment)

# show available assays
assayNames(stadprpr)

# get FPKM
assay(stadprpr, "fpkm_uq_unstrand")

# show sample annotations
colData(stadprpr)

# show gene annotations
rowData(stadprpr)

Accessing specialized data formats such as a SE with @ results in these types of hickups. There are always dedicated functions (setters/getters) for subsetting and extraction operations, see https://bioconductor.org/packages/release/bioc/vignettes/SummarizedExperiment/inst/doc/SummarizedExperiment.html#assays