Entering edit mode
7.3 years ago
landscape95
▴
190
Hello, everyone, I am trying to reproduce the lncRNA get from the paper "Portraying breast cancer with lncRNAs" by Olivier et. al. Here I came to the reproducing step:
I don't know whether my code is doing right, can someone give me your comment?
# load RAW files
RAW_files = list.files(path = "./RAWsets", pattern = "_RAW.tar")
# RNA.list = list of RAW file after processed by Frozen Robust Multiarray Analysis
RNA.list = list()
# iterate through the .CEL files
for (RAWstr in RAW_files) {
print(RAWstr)
# extract those raw .tar files
RAWdirectory = paste("./RAWsets", substr(RAWstr, 1, nchar(RAWstr) - 8), sep = "/")
tardir = paste("./RAWsets", RAWstr, sep = "/")
untar(tardir, exdir = RAWdirectory, verbose = F)
# extract those extracted .CEL files
cels = list.files(RAWdirectory, pattern = "CEL.gz", full.names = T)
sapply(cels, gunzip)
# read all cel files to data.raw variable
data.raw = ReadAffy(filenames = substr(cels, 1, nchar(cels)-3), verbose = F, cdfname = "hgu133plus2cdf")
# frozen robust multiarray analysis
data.matrix = frma(data.raw)
# add to RNA.list as described above
tmp = list(data.matrix)
names(tmp) = substr(RAWstr, 1, nchar(RAWstr) - 8)
RNA.list = append(RNA.list, tmp)
}
### Combine into 1 matrix of all data sets
RNAdata = as.matrix(exprs(RNA.list[[1]]))
for (i in 2:length(RNA.list)) {
RNAdata = CombineMatrix(RNAdata, as.matrix(exprs(RNA.list[[i]])))
}
# normalize between arrays on a data of fRMA before
RNAdata_scaled = normalizeBetweenArrays(RNAdata)
# because samples names are GSMxxxxxx.CEL => erase .CEL extension
colnames(RNAdata_scaled) = substr(colnames(RNAdata_scaled), 1, 9)
# load data from table S1, S2, S3, S7
load("tables_set.RData")
# mutual index between total data and tableS1 which have clinical information
tableS1 = tableS1[match(colnames(RNAdata_scaled), tableS1[, 1]), ]
# compute batch data using tumor vs normal tissue as covariate
batch = as.numeric(lapply(tableS1[, 3], function(str) if (str == "normal") return(1) else return(2)))
# applied Combat algorithm in sva library with default parameters to adjust data for batch effects
RNAdata_scaled = ComBat(dat = RNAdata_scaled, batch = batch)
Your help is really appreciated!
Thank you, the thing is I don't know how to set the batch values as the paper's description: "using tumor vs normal as covariate". Could you give me some idea?
Ex.
If you have 4 experiments (represented by 4 GEO accession number), each with 2 replicates and with 2 conditions (tumor and normal), your pdata file will look like this:
Thank you very much, I fix the code to this, could you have a comment? Pheno contains 3 columns, for example: "GSM272133 1 normal" is the sample GSM272133 comes from my first GEO data set GSE10780 (I mark it as 1) and it is a normal sample. "GSM272972 2 cancer" is the sample GSM272972 comes from my second GEO data set GSE10810 (I mark it as 2) and it is a tumor sample. Then I apply this code, Is this ok?
Looks good, then look with a PCA how your samples are clustering now and compare with the clustering pre-batch-effect correction
Thank you. I have another question on "using the limma and fRMA libraries to obtain log2-normalized expression signals for each probe set". Did I do it right in the code, would you please have a look in my code? I am not sure about 2 lines:
and
Thank you!
Never used the frozen RMA. With simple RMA: