Hi All, Hope you all experts are doing great! I am currently working on TCGABiolink Package. (Please consider my knowledge of coding as entry level). I have two sets of barcodes: 1st set deals with Tumor samples without relapse history and 2nd set deals with tumor relapse. My experimental design is to compare the Differentially methylated regions Analysis in set A (Without Relapse) Vs Set B (Relapse).
I am following the tutorials present in TCGABiolink Package, Here they describe the Analysis b/w Tumor Vs Normal. Now my Question how to set Condition Type? Where should I mention about the sample status Like Relapsed or Not relapsed? should I prepare a separate? like a .CSV file or what is the best possible solution for declaring sample comditions? The basic code is as follows:
query <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
samplesDown <- getResults(query,cols=c("cases"))
dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "TP")
dataSmNT <- TCGAquery_SampleTypes(barcode = samplesDown,
typesample = "NT")
dataSmTP_short <- dataSmTP[1:10]
dataSmNT_short <- dataSmNT[1:10]
queryDown <- GDCquery(project = CancerProject,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = c(dataSmTP_short, dataSmNT_short))
GDCdownload(query = queryDown,
directory = DataDirectory)
dataPrep <- GDCprepare(query = queryDown,
save = TRUE,
directory = DataDirectory,
save.filename = FileNameData)
dataPrep <- TCGAanalyze_Preprocessing(object = dataPrep,
cor.cut = 0.6,
datatype = "HTSeq - Counts")
dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep,
geneInfo = geneInfoHT,
method = "gcContent")
boxplot(dataPrep, outline = FALSE)
boxplot(dataNorm, outline = FALSE)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm,
method = "quantile",
qnt.cut = 0.25)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmTP_short],
mat2 = dataFilt[,dataSmNT_short],
Cond1type = "Normal",
Cond2type = "Tumor",
fdr.cut = 0.01 ,
logFC.cut = 1,
method = "glmLRT")
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT],
mat2 = dataFilt[,samplesTP],
Cond1type = "Normal",
Cond2type = "Tumor",
fdr.cut = 0.01 ,
logFC.cut = 1,
method = "glmLRT")
I am totally clueless about how to declare the sample conditions so that i can test the DGE and DMR analysis between these two sets.
Thanks a lot for your Help.
Have a great day ahead !!!
Dave (Confused)!
Thanks a lot for your suggestion, Mathias.heydt. It was indeed very helpful. I have set of samples which I have classified as Set A and Set B (Both are from Colon Cancer). Now the problem is, I just want to bypass the clinical classification since I am not classifying between Normal Vs tumor, instead, I am looking for Set B Vs Set A.. can you please let me know how to go about this. Thanks a lot for your help !!! (Sorry for my English - I am a non-native English speaker!!!) Sincerely, Dave
Yeah, like I said in my answer, if you know the barcodes of your samples in set A and set B, you can just specify them; In the example they specify their sets with the TCGAquery_samples function. You don't need to do that. Just list the barcodes directly. dataSmTP is just a variable name, so you can choose your own, something like:
Just make sure that, wherever in the example dataSmTP was written, you change it to setA - the new name of this variable, and do the same for setB. You can choose the barcodes and 'make' your own 2 sets to compare.
But if you don't know all the sample barcodes, and you need to find them based on a clinical variable, then you'll want to download the clinical files and write some more code.
You can also add file or case filters in the repository tab of the (TCGA) GDC data portal: https://portal.gdc.cancer.gov/repository Look up the clinical or biospecimen variable you need in the dictionary: https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-entity-list&anchor=clinical And add a Case/biospecimen filter in the data portal based on the variables you need. You should then be able to figure out a way to produce a list of all the barcodes with the clinical/biospecimen traits you are interested in.
I am following the tiagochst code here, and i am finding it hard to replicate the same. i have given the samples which are Left and right (SetA and SetB) as follows,
samplelist_left <- c("TCGA-A6-2671-01A", "TCGA-A6-2674-01A", "TCGA-A6-2674-01B", "TCGA-A6-2674-01A", "TCGA-A6-2675-01A", "TCGA-A6-2685-01A", "TCGA-A6-3807-01A", "TCGA-A6-3810-01B", "TCGA-A6-3810-01A", "TCGA-A6-3810-01A", "TCGA-A6-5656-01B") samplelist_right <- c("TCGA-4N-A93T-01A", "TCGA-5M-AAT4-01A", "TCGA-5M-AATE-01A", "TCGA-A6-2677-01A", "TCGA-A6-2677-01B", "TCGA-A6-2679-01A", "TCGA-A6-2680-01A", "TCGA-A6-2681-01A", "TCGA-A6-2683-01A", "TCGA-A6-2684-01A", "TCGA-A6-2684-01C", "TCGA-A6-2684-01A", "TCGA-A6-3808-01A", "TCGA-A6-4105-01A", "TCGA-A6-4107-01A", "TCGA-A6-5659-01A")
#getProbeInfo(mae)
group.col <- "definition" group1 <- samplelist_left group2 <- samplelist_right dir.out <- "result"
Sig.probes <- get.diff.meth(data = mae, group.col = group.col, group1 = group1, group2 = group2, minSubgroupFrac = 0.2, sig.dif = 0.3, diff.dir = "hypo", # Search for hypomethylated probes in group 1 cores = 4, dir.out = dir.out, pvalue = 0.01) but i am getting the follwing error, not able to understand where i am going wrong.
Please help me to understand where I am going wrong in defining the sample groups. I have succeeded in creating mea group as well but I cannot move forward. Please help. Sorry for bugging, Have a great day! Thanks Dave
You're doing something totally different from the example, and different from your first question anyway. The MAE is a separate class, and in the example, they create it using summarizedExperiment class objects (lusc.exp and lusc.met) I'm not using these classes, so I can't look into it for you, but they name the parameters to something that is probably stored into the MAE: group.col <- "definition" group1 <- "Primary solid Tumor" group2 <- "Solid Tissue Normal" dir.out <- "result"
And you are feeding it variables directly.
I suggest you just delve into the documentation to figure stuff out. https://bioconductor.org/packages/release/bioc/manuals/ELMER/man/ELMER.pdf