Entering edit mode
5 months ago
Nusrat
•
0
Can anyone please help?
I was doing DEG analysis using R studio following the star protocol of network analysis for Testicular Cancer. Link is attached of that protocol. Got stuck in this step 7.
and this is the output it gives
this is protocol link: link
here's the full R code:
setwd("/pathname")
###Step3
library(UCSCXenaTools)
library(data.table)
library(R.utils)
library(dplyr)
###Step4
data(XenaData)
write.csv(XenaData, "00_tblXenaHubInfo.csv")
###Step5
GeneExpectedCnt_toil = XenaGenerate(subset = XenaHostNames == "toilHub") %>%
XenaFilter(filterCohorts = "TCGA TARGET GTEx") %>%
XenaFilter(filterDatasets = "TcgaTargetGtex_gene_expected_count")
XenaQuery(GeneExpectedCnt_toil) %>%
XenaDownload(destdir = "./")
paraCohort = "TCGA testicular Cancer"; #Selecting the Testicular Cancer cohort.
paraDatasets = "TCGA.TGCT.sampleMap/TGCT_clinicalMatrix"; #Selecting the Testicular Cancer clinical matrix.
Clin_TCGA = XenaGenerate(subset = XenaHostNames == "tcgaHub") %>%
XenaFilter(filterCohorts = paraCohort) %>%
XenaFilter(filterDatasets = paraDatasets);
XenaQuery(Clin_TCGA) %>%
XenaDownload(destdir = "./")
Surv_TCGA = XenaGenerate(subset = XenaHostNames == "toilHub") %>%
XenaFilter(filterCohorts = "TCGA TARGET GTEx") %>%
XenaFilter(filterDatasets = "TCGA_survival_data");
XenaQuery(Surv_TCGA) %>%
XenaDownload(destdir = "./")
Pheno_GTEx = XenaGenerate(subset = XenaHostNames == "toilHub") %>%
XenaFilter(filterCohorts = "TCGA TARGET GTEx") %>%
XenaFilter(filterDatasets = "TcgaTargetGTEX_phenotype");
XenaQuery(Pheno_GTEx) %>%
XenaDownload(destdir = "./")
###Step6
filterGTEx01 = fread("TcgaTargetGTEX_phenotype.txt.gz");
names(filterGTEx01) = gsub("\\_", "", names(filterGTEx01));
paraStudy = "GTEX"; #Setting "GTEx" as the study of interest.
paraPrimarySiteGTEx = "Testes";
paraPrimaryTissueGTEx = "^Testes";
filterGTEx02 = subset(filterGTEx01,
study == paraStudy &
primarysite == paraPrimarySiteGTEx &
grepl(paraPrimaryTissueGTEx, filterGTEx01$`primary disease or tissue`))
filterTCGA01 = fread(paraDatasets);
names(filterTCGA01) = gsub("\\_", "", names(filterTCGA01));
paraSampleType = "Primary Tumor"; #Setting "Primary Tumor" as the sample type of interest.
paraPrimarySiteTCGA = "Testes";
paraHistologicalType = "Testicular Germ Cell Tumor";
filterTCGA02 = subset(filterTCGA01,
sampletype == paraSampleType &
primarysite == paraPrimarySiteTCGA &
grepl(paraHistologicalType, filterTCGA01$histologicaltype))
filterExpr = c(filterGTEx02$sample, filterTCGA02$sampleID, "sample");
ExprSubsetBySamp = fread("TcgaTargetGtex_gene_expected_count.gz",
select = filterExpr)
###STEP7 start from here
probemap = fread("zz_gencode.v23.annotation.csv", select = c(1, 2));
exprALL = merge(probemap, ExprSubsetBySamp, by.x = "sampleID", by.y = "sample");
genesPC = fread("zz_gene.protein.coding.csv");
exprPC = subset(exprALL, gene %in% genesPC$Gene_Symbol);
the column 'gene' does not exist in the exprALL data frame. Inspect dataframe. It could be that exprALL has a different column name for genes, such as Gene_Symbol or something similar. Can you show
head(exprALL)
?Thanks for responding. Let me show you all the column names for these files. Also adding links of these files for your understanding.
https://toil-xena-hub.s3.us-east-1.amazonaws.com/download/TcgaTargetGtex_gene_expected_count.gz
https://osf.io/edjzv/
I believe you, it's a file of 1.20 GB... Looking at the data, it seems that something went wrong in the merge. Could you show me ExprSubsetBySamp, please? Since the goal is to merge these data frames only for the selected columns
ExprSubsetBySamp is the big file that you just mentioned of 1.20 GB.
Here's its column.
Please do not post screenshots of text content. You can copy/paste the text in edit window. Then use
101010
button to format it ascode
.You are filtering ExprSubsetBySamp using filterExpr, which contains only the columns sample_id and sample. It's expected that it only has one column that is not gene. You need to include the gene column from the TcgaTargetGtex_gene_expected_count.gz file in this selection; otherwise, you will never get a match between genes and sample names. Therefore, the issue lies upstream in the data source. I can't download your file since I'm not in lab, but you need to inspect the count.gz file to see if it has a gene column.