Hi All,
I'm an MSc. student trying to gain experience in multi-omics data analysis. To this end, I've been working with the TCGAbiolinks and mixOmics R packages, but I would appreciate clarification about how to proceed.
Basically, I start by downloading RNA-seq and miRNA sample metadata for TCGA-BRCA via TCGAbiolinks::GDCquery() and TCGAbiolinks::getResults(), deduplicating each output based on their 'case' entries, and observing whether there are any matching cases between my RNA-seq output and my miRNA output. When I apply this procedure to the GDCquery outputs, I remove duplicate cases within an output but find no matching cases between outputs. My R script, thus far, is provided below:
library(TCGAbiolinks)
library(tidyverse)
library(stringr)
deduplicate_tcga_query_outputs <- function(x){
output <- x[!duplicated(x$cases),]
return(output)
}
project <- "TCGA-BRCA"
query_rnaseq <- GDCquery(
project = project,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
access = "open"
)
query_mirna <-
GDCquery(
project = project,
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
access = "open"
)
output_rnaseq <- getResults(query_rnaseq)
output_mirna <- getResults(query_mirna)
output_rnaseq <- deduplicate_tcga_query_outputs(output_rnaseq)
output_mirna <- deduplicate_tcga_query_outputs(output_mirna)
sum(duplicated(output_rnaseq$cases))
sum(duplicated(output_mirna$cases))
shared_cases <- intersect(output_rnaseq$cases, output_mirna$cases)
shared_cases
To my understanding, the mixOmics package requires multi-omics data to be measured across matching participants (when performing N-integration). Additionally, it provides a small multi-omics toy data set constructed from the TCGA-BRCA project to help users get started (so clearly, matching cases must exist within TCGA-BRCA across -omics hierarchies).
If anyone has experience integrating samples across -omics hierarchies using TCGAbiolinks in R, I would greatly appreciate your input.
Thanks so much.