Entering edit mode
2.2 years ago
prithvi.mastermind
▴
50
I'm interested in pre-processing RNA-Seq data. I extracted the raw counts from UCSC Xena and performed following steps for pre-processing and getting the data ready for downstream analysis. However the plots obtained after batch effect removal and z score transformation are very interesting but still the median line in the z score transformed data is still not straight. Are there still any outliers present? How to remove the biases/noises to fully clean the data?
#### Load the libraries###
library(EDASeq)
library(NOISeq)
library(edgeR)
library(DESeq2)
library(ggplot2)
library(reshape2)
library(gplots)
library(RColorBrewer)
library(limma)
library(sva)
library(biomaRt)
###Load the read counts and phenotype data###
rawCountTable <- as.matrix(read.delim(file.choose(), row.names=1))
Col_data = read.table(file = "LUSC_Phenotype.txt", header = T, sep = "\t")
###Use DESeq 2 for normalization and log transformation###
dds = DESeqDataSetFromMatrix(countData = adjusted, colData = Col_data, design = ~ Type)
dds = DESeq(dds)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds = estimateSizeFactors(dds)
sizeFactors(dds)
vsd <- vst(dds)
vsd2 <- assay(vst(dds, blind=FALSE))
###Use NOIseq to remove batch effects from normalized data ###
DATA_BC<- readData(vsd2, factors = PHENO1)
myPCA = ARSyNseq(DATA_BC, factor = "batch_number", batch = TRUE, norm = "n", logtransf = TRUE)
DATA_BC_DONE <- assayData(myPCA)$exprs
### Perform z score transformation using scale function ###
transposed_matrix <- t(DATA_BC_DONE)
z_tr_mt <- scale(transposed_matrix)
z_score <- t(z_tr_mt)
could you explain what is in
adjusted
variable ? You never declare it before using on this linedds = DESeqDataSetFromMatrix(countData = adjusted, colData = Col_data, design = ~ Type)
Actually adjusted was written mistakenly. There is rawCountTable instead of adjusted. According to yu is the data clean enough to proceed with downstream analysis or there is ned for more cleaning?
I always thought you batch corrected before normalization with DESeq2, can someone correct me if I'm wrong?