Quantile Normalization in R
1
5
Entering edit mode
6.8 years ago
KVC_bioinfo ▴ 600

Hello All,

I have read counts from RNA seq data in row and columns. I want to quantile normalized them in R. I have following code. This gives me the normalized values. However, the output is a matrix. I want the output with row name and column name so that I can perform PCA on it.

data <- read.csv("data.csv",header=T)
head(data)
data_mat <- as.matrix(data[,-1]) 
head(data_mat)
data_norm <- normalize.quantiles(data_mat, copy = TRUE)

Could someone help me to get that? Thank you in advance.

normalization quantile R Bioconductor • 17k views
ADD COMMENT
0
Entering edit mode

Are you implying that your data_norm object has no row or column names after you perform quantile normalisaton? What about your data.csv file?

ADD REPLY
0
Entering edit mode

Yes exactly. data_norm object has no row or column names after I perform quantile normalization. However, data.csv has it.

ADD REPLY
6
Entering edit mode
6.8 years ago

Try this (note the extra line; also use data.matrix, not as.matrix):

data <- read.csv("data.csv",header=T)
head(data)
rownames(data) <- data[,1]
data_mat <- data.matrix(data[,-1]) 
head(data_mat)
data_norm <- normalize.quantiles(data_mat, copy = TRUE)
ADD COMMENT
2
Entering edit mode

It works. Thank you very much.

ADD REPLY
1
Entering edit mode

You're the best.

ADD REPLY
1
Entering edit mode

Hi Kavin,

I am too having this problem.

data=read.csv("bk.txt", sep="\t", header=T)
head(data)
X adult.endothelial.progenitor.cell alternatively.activated.macrophage
1      ABCG4                              1.17                               1.00
2 AP003391.1                              1.00                               1.00
3      ATP5L                            170.36                             200.45
4      BCL9L                             17.52                               1.74
5  BMPR1APS2                              1.04                               1.05
6     C2CD2L                              4.44                              11.20
rownames(data) <- data[,1]
data_mat <- data.matrix(data[,-1]) 
head(data_mat)
adult.endothelial.progenitor.cell alternatively.activated.macrophage
ABCG4                                   1.17                               1.00
AP003391.1                              1.00                               1.00
ATP5L                                 170.36                             200.45
BCL9L                                  17.52                               1.74
BMPR1APS2                               1.04                               1.05
C2CD2L                                  4.44                              11.20
data_norm <- normalize.quantiles(data_mat, copy = TRUE)
head(data_norm)
 [,1]       [,2]      [,3]       [,4]       [,5]      [,6]       [,7]       [,8]
[1,]   1.316610   1.002034  1.002034   1.006864   1.201017  1.000169   1.316610   1.001017
[2,]   1.003051   1.002034  1.002034   1.006864   1.002034  5.781186   1.002034   1.001017
[3,] 219.738136 219.738136 87.607966 219.738136 219.738136 87.607966 219.738136 219.738136
[4,]  12.947627   1.983136  5.781186   1.201017   4.649492 19.805254   2.767627   5.781186
[5,]   1.201017   1.133051  1.316610   1.006864   1.002034  1.092881   1.002034   1.001017
[6,]   2.767627  25.918475 16.030169   4.649492  25.918475  2.150000  16.030169   2.767627

There is no rows and columns names in the output file. Can you figure out what is wrong with this? Appreciate your help.

ADD REPLY
1
Entering edit mode

I see that you have posted here? Quantile Normalization in R and output data

The colnames and rownames of data_norm are the same as data_mat

ADD REPLY
0
Entering edit mode

Hi Kevin, Can you tell me, I have 3 same tissue RNA-seq data and I have the readcounts of every gene from featureCounts and HTseq and Cufflinks. my question is what should be there in my data.csv file ( only the counts or gene list + counts). Thanks in advance.

ADD REPLY
0
Entering edit mode

featureCounts and HTseq produce raw counts; Cufflinks would have produced normalised counts, most likely by FPKM.

ADD REPLY
1
Entering edit mode

My question is what should be there in my input data.csv file for quantile normalization ( only the counts or gene list + counts). Thanks in advance.

My data.csv looks like :

sample1 sample2 sample3 sample4 sample 5

1000 250000 352 5425 5985

1533 54896 5482 6549 6464

ADD REPLY
1
Entering edit mode

It can be any numerical data, usually with samples as columns and genes/probes as rows. If you're attempting to normalise some RNA-seq counts by a standard quantile normalisation function, then I would not do that. You should use one of the published methods like EdgeR, DESeq2, or something else in order to perform the normalisation.

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6