Hi,
Im exploring and integrating the LUAD TGCA transcriptomic and genomic data. Im trying to do so both with TCGAbiolinks in R and cBioportal.
With TCGAbiolinks I acces the data this way (https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/analysis.html#TCGAvisualize:_Visualize_results_from_analysis_functions_with_TCGA%E2%80%99s_data)
Trasncriptomic data
query <- GDCquery(#legacy = T,
project = "TCGA-LUAD",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts",
experimental.strategy = "RNA-Seq")
GDCdownload(query, method = "api",files.per.chunk = 1000, directory = "/home/arantxa/proyects/itziar/FIS")
LUAD <- GDCprepare(query = query, directory = "/home/arantxa/proyects/itziar/FIS")
LUADMatrix <- assay(LUAD,"unstranded")
# For gene expression if you need to see a boxplot correlation and AAIC plot to define outliers you can run
LUAD.RNAseq_CorOutliers <- TCGAanalyze_Preprocessing(LUAD)
# change of sample name to be the same than genomic data
rna_samples <- data_frame(V1=colnames(LUAD.RNAseq_CorOutliers))
colnames(LUAD.RNAseq_CorOutliers) <- rna_samples %>% mutate(V2 = str_sub(V1, start = 1, end = -13)) %>% .$V2
Genomic data
query <- GDCquery(
project = "TCGA-LUAD",
data.category = "Simple Nucleotide Variation",
access = "open",
legacy = FALSE,
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query, directory = "/home/arantxa/proyects/itziar/FIS")
maf <- GDCprepare(query, directory = "/home/arantxa/proyects/itziar/FIS")
But then, when I try to acces LUAD TCGA on cBioportal I find 3 different datasets (firehose, NATURE and PanCancer Atlas). And the number of samples with specific mutations etc doesn´t add up with the TCGAbiolinks cohort. Also, I can´t compare transcriptomic data from diferent datasets.
So my question is
1.- Where does this difference come from? 2.- Which is the best way to explore this dataset on cBioportal, as it would be my first choice.