Question

TCGAbiolinks: Accessing Matched Samples' Data Across -Omics Hierarchies

0

Entering edit mode

12 months ago

Desmond • 0

Hi All,

I'm an MSc. student trying to gain experience in multi-omics data analysis. To this end, I've been working with the TCGAbiolinks and mixOmics R packages, but I would appreciate clarification about how to proceed.

Basically, I start by downloading RNA-seq and miRNA sample metadata for TCGA-BRCA via TCGAbiolinks::GDCquery() and TCGAbiolinks::getResults(), deduplicating each output based on their 'case' entries, and observing whether there are any matching cases between my RNA-seq output and my miRNA output. When I apply this procedure to the GDCquery outputs, I remove duplicate cases within an output but find no matching cases between outputs. My R script, thus far, is provided below:

 library(TCGAbiolinks)
    library(tidyverse)
    library(stringr)

    deduplicate_tcga_query_outputs <- function(x){

      output <- x[!duplicated(x$cases),]

      return(output)

    }

    project <- "TCGA-BRCA"


    query_rnaseq <- GDCquery(
      project = project,
      data.category = "Transcriptome Profiling",
      data.type = "Gene Expression Quantification",
      access = "open"
      )


    query_mirna <- 
      GDCquery(
        project = project,
        data.category = "Transcriptome Profiling",
        data.type = "miRNA Expression Quantification",
        access = "open"
      )

    output_rnaseq <- getResults(query_rnaseq)
    output_mirna <- getResults(query_mirna)

    output_rnaseq <- deduplicate_tcga_query_outputs(output_rnaseq)
    output_mirna  <- deduplicate_tcga_query_outputs(output_mirna)

    sum(duplicated(output_rnaseq$cases)) 
    sum(duplicated(output_mirna$cases)) 

    shared_cases <- intersect(output_rnaseq$cases, output_mirna$cases)

    shared_cases

To my understanding, the mixOmics package requires multi-omics data to be measured across matching participants (when performing N-integration). Additionally, it provides a small multi-omics toy data set constructed from the TCGA-BRCA project to help users get started (so clearly, matching cases must exist within TCGA-BRCA across -omics hierarchies).

If anyone has experience integrating samples across -omics hierarchies using TCGAbiolinks in R, I would greatly appreciate your input.

Thanks so much.

TCGAbiolinks mixOmics multi-omics TCGA-BRCA • 459 views

ADD COMMENT • link 12 months ago by Desmond • 0