Question

What is the best way to pre-process RNA-Seq data?

1

Entering edit mode

2.2 years ago

prithvi.mastermind ▴ 50

I'm interested in pre-processing RNA-Seq data. I extracted the raw counts from UCSC Xena and performed following steps for pre-processing and getting the data ready for downstream analysis. However the plots obtained after batch effect removal and z score transformation are very interesting but still the median line in the z score transformed data is still not straight. Are there still any outliers present? How to remove the biases/noises to fully clean the data?

#### Load the libraries###

library(EDASeq)
library(NOISeq)
library(edgeR)
library(DESeq2)
library(ggplot2)
library(reshape2)
library(gplots)
library(RColorBrewer)
library(limma)
library(sva)
library(biomaRt)

###Load the read counts and phenotype data###

rawCountTable <- as.matrix(read.delim(file.choose(), row.names=1))
Col_data = read.table(file = "LUSC_Phenotype.txt", header = T, sep = "\t")

###Use DESeq 2 for normalization and log transformation###

dds = DESeqDataSetFromMatrix(countData = adjusted, colData = Col_data, design = ~ Type)
dds = DESeq(dds)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds = estimateSizeFactors(dds)
sizeFactors(dds)
vsd <- vst(dds)
vsd2 <- assay(vst(dds, blind=FALSE))

###Use NOIseq to remove batch effects from normalized data ###

DATA_BC<- readData(vsd2, factors = PHENO1)
myPCA = ARSyNseq(DATA_BC, factor = "batch_number", batch = TRUE, norm = "n", logtransf = TRUE)
DATA_BC_DONE <- assayData(myPCA)$exprs

### Perform z score transformation using scale function ###

transposed_matrix <- t(DATA_BC_DONE)
z_tr_mt <- scale(transposed_matrix)
z_score <- t(z_tr_mt)

enter image description here

RNA-Seq z-score Normalization • 886 views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 2.2 years ago by prithvi.mastermind ▴ 50

0

Entering edit mode

could you explain what is in adjusted variable ? You never declare it before using on this line dds = DESeqDataSetFromMatrix(countData = adjusted, colData = Col_data, design = ~ Type)

ADD REPLY • link 2.2 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Actually adjusted was written mistakenly. There is rawCountTable instead of adjusted. According to yu is the data clean enough to proceed with downstream analysis or there is ned for more cleaning?

ADD REPLY • link 2.2 years ago by prithvi.mastermind ▴ 50

0

Entering edit mode

I always thought you batch corrected before normalization with DESeq2, can someone correct me if I'm wrong?

ADD REPLY • link 2.2 years ago by Trivas ★ 1.8k