Quantile normalization in huge matrices
1
2
Entering edit mode
9.2 years ago

I would like to apply the quantile normalization on a huge matrix. I tried to turn my matrix into a ff object as the example below:

library(ff)
    df <- ' sample1 sample2 sample3
            1834.2 1743.4 1384
            4711 4922 4650
            4555 1387 4650.8
            2588 1325 3258'
    df <- read.table(text=df, header=T)
    write.table(df, "del.txt", col.names=T, row.names=F, quote=F, sep="\t")
    df <- read.table.ffdf(file="del.txt", header=T)

Thus, I tried:

library(preprocessCore)
df <- normalize.quantiles(df)

And got:

Error in normalize.quantiles(df) : Matrix expected in normalize.quantiles

I know that is possible to convert the ff object to a matrix and apply the nomalization, but is exactly what I am trying to avoid.

If I make the conversion ff object to a matrix and try to normalize it (after more than 20 hours running!) produce an error as follows:

    Error in unlist(x, recursive = FALSE) :
      long vectors not supported yet: memory.c:1648

I would be grateful for suggestions to perform this normalization, using or not using, the ff package. Thank you!

Update to cpad0112:

Thank you for your answer!

However, I tried to run the ff object (ff package) and I got:

 quantile_normalisation(df)
Error in aperm.default(X, c(s.call, s.ans)) :
  invalid first argument, must be an array

My matrix is too huge to run as.data.frame (600k in rows vs 3k in column). It can be adapted to a ff object?

matrix big-data r • 7.3k views
ADD COMMENT
1
Entering edit mode
9.2 years ago

It seems you want to avoid converting it to matrix. Following code may help you:

1) Save following code from http://davetang.org/muse/2014/07/07/quantile-normalisation-in-r/ in a separate file to load in R session as source file:

quantile_normalisation <- function(df){
  df_rank <- apply(df,2,rank,ties.method="min")
  df_sorted <- data.frame(apply(df, 2, sort))
  df_mean <- apply(df_sorted, 1, mean)

  index_to_mean <- function(my_index, my_mean){
    return(my_mean[my_index])
  }
df_final <- apply(df_rank, 2, index_to_mean, my_mean=df_mean)
  rownames(df_final) <- rownames(df)
  return(df_final)
}

2) Run the following code:

Please note that I saved code from davetang blog as quantile.r and above example lines as del.txt.

df <-read.delim2(file="del.txt", header=T, sep=" ", stringsAsFactors = F)
df$sample1=as.numeric(df$sample1)
df$sample2=as.numeric(df$sample2)
df$sample3=as.numeric(df$sample3)
#############
##Use davetang code
###########
source("quantile.r")
quantile_normalisation(df)

3) In addition, I ran following code. Values from Davetang's code and below code are exactly same:

library(preprocessCore)
normalize.quantiles(as.matrix(df), copy = F)

My session info is:

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] preprocessCore_1.30.0

loaded via a namespace (and not attached):
[1] tools_3.2.2
ADD COMMENT
0
Entering edit mode

Oh okay. Probably, I didn't understand your question proper. I don't have resources to such large computations. There are few pointers:

1) One cannot run normal functions on ff objects. For eg look at the structure of ff object you created using read.table.ffdf(). On that quantile normalizations from other packages will fail.

2) ffdf object is a special object used within ff package (as I understand)

ADD REPLY
0
Entering edit mode

Yes, ff objects are as far I understand are hybrid objects and remain almost entirely on hard disk instead of RAM memory.

The ff objects have several ff functions which mimetizes normal functions (mergeffdf, subsetffdf....), thats why I was wondering if this quantile normalize function could be adapted also.

ADD REPLY

Login before adding your answer.

Traffic: 2121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6