Differential expression analysis error - TCGAbiolinks
0
0
Entering edit mode
5.5 years ago
Misty • 0

Hi Everyone,

I am trying to analyze TCGA-BRCA (harmonized) data using TCGAbiolinks and I am stuck at the DEA stage where I get the following error:

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

I can't seem to pinpoint where and what is giving me this error. I have attached my code which I put together trying to teach myself using tutorials. What I want to do is: compare expression of my gene of interest (SOX10) between different PAM50 subtypes. The thing is, when I do DE on normal versus tumor, it shows that SOX10 is downregulated. However, when I try to visualize SOX10 expression on UCSC (even cBioportal) I see that it is upregulated in basal tumors compared to normal (or normal-like, I'm so confused?!!). I specifically want to show that even though SOX10 is downregulated in pooled subtypes when compared to normal, it is upregulated if only basal subtype is considered. Also, I tried doing multiple constrasts using the contrast.matrix function but it gave me a whole lot of errors so I gave up.

Is there a simpler way to do this? I really need to show stats on a nice boxplot of Log2FC versus PAM50 subtype! I also want to further stratify each Pam50 into Her2+ and Her2-

Please help, and thank you!!

TCGA BRCA - SOX10

library(TCGAbiolinks)
library(ggplot2)
library(dplyr)
library(DESeq2)

Cancer <- "TCGA-BRCA"
query_BRCA <- GDCquery(project = Cancer,
                       data.category = "Transcriptome Profiling",
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "HTSeq - Counts")

samples_BRCA <- getResults(query_BRCA,cols=c("cases"))

queryDown_brca <- GDCquery(project = Cancer, 
                            data.category = "Transcriptome Profiling",
                            data.type = "Gene Expression Quantification", 
                            workflow.type = "HTSeq - Counts", 
                            barcode = samples_BRCA)

GDCdownload(query = queryDown_brca, method = "api", files.per.chunk = 50)

dataPrep1.brca <- GDCprepare(query = queryDown_brca, 
                             save = TRUE, 
                             save.filename = "TCGA_BRCA_HTSeq_Counts.rda")

dataPrep.brca <- TCGAanalyze_Preprocessing(object = dataPrep1.brca, 
                                           cor.cut = 0.6,
                                           datatype = "HTSeq - Counts")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataPrep.brca,
                                           geneInfo = geneInfoHT,
                                           method = "gcContent")

dataNorm.brca <- TCGAanalyze_Normalization(tabDF = dataNorm.brca,
                                           geneInfo = geneInfoHT,
                                           method = "geneLength")

dataFilt.brca <- TCGAanalyze_Filtering(tabDF = dataNorm.brca,
                                       method = "quantile", 
                                       qnt.cut =  0.25)

dim(dataFilt.brca)

### Prepping for DEA ###

sampleTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "TP")
sampleNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt.brca), typesample = "NT")
dataFiltTP <- dataFilt.brca[,sampleTP]
dataFiltNT <- dataFilt.brca[,sampleNT]

dataSubt_BRCA <-TCGAquery_subtype(tumor = "BRCA")
dataSubt_BRCA$BRCA_Subtype_PAM50
samplePam50_BRCA.LumA <- dataSubt_BRCA[dataSubt_BRCA$BRCA_Subtype_PAM50 %in% "LumA",]

samplePam50_BRCA.LumA$BRCA_Subtype_PAM50

dataFiltTP.LumA <- dataFiltTP[,substr(colnames(dataFiltTP),1,12) %in% samplePam50_BRCA.LumA$patient]

dim(dataFiltNT)
dim(dataFiltTP.LumA)

DEG_LumA_NT <- TCGAanalyze_DEA(mat1 = dataFilt.brca[,dataFiltNT],
                            mat2 = dataFilt.brca[,dataFiltTP.LumA],
                            pipeline="limma",
                            Cond1type = "Normal",
                            Cond2type = "Tumor",
                            fdr.cut = 0.05 ,
                            logFC.cut = 1,
                            voom=TRUE,
                            method = "glmLRT", ClinicalDF = data.frame())

Error in dataFilt.brca[, dataFiltNT] : subscript out of bounds

RNA-Seq TCGA BRCA TCGAbiolinks • 2.5k views
ADD COMMENT
1
Entering edit mode

There are issues, generally, with the functionality of TCGAanalyze_DEA(). It was first elaborated here: Matched paired DEA analysis of BRCA samples using TCGAbiolinks, works for paired = FALSE but not paired = TRUE

I contacted the main developer, who then forwarded it to another developer responsible for that part of the package, and nobody ever got back to me. I then raised a ticket (issue) on GitHub about it but none of the developers responded.

If I may suggest that you obtain the data directly from the GDC and process it from there.

ADD REPLY
0
Entering edit mode

Hi Kevin, is there any update on this issue? i am also facing the same issue [Error in dataFilt[, samples] : subscript out of bounds)]. Thanks in advance.

ADD REPLY
1
Entering edit mode

Nope, there was never a response: https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/276

As I'm now a developer myself, I know that it can genuinely be difficult to find time to follow up to these issues. However, in the case of the 'paired' analysis issue, TCGAbiolinks is actually providing misleading results.

If you can, I would get the data from Xena Browser and then re-process it. If you're just interested in a single gene, you could also use cBioPortal (GUI)

ADD REPLY
0
Entering edit mode

Did you try the code in my other linked thread?

ADD REPLY
0
Entering edit mode

No, I did not try, yet. But, I'll try to give it a go. I was working on manually curated datasets (patient IDs divided based on clinical features). I tried codes on several different sets, sometime it works, sometimes it doesn't.

ADD REPLY
0
Entering edit mode

I have tried the TCGAutils R package it worked for me. They have provided functions for multi-way ID/UUID/barcode conversion. Have a look once. http://bioconductor.org/packages/release/bioc/vignettes/TCGAutils/inst/doc/TCGAutils.html#tcga-barcode-to-uuid

ADD REPLY

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6