Question

[R] Removing columns from big.matrix which have only one value

0

Entering edit mode

6.6 years ago

jackarnestad • 0

I have a very large binary matrix, stored as a big.matrix to conserve memory (it is over 2 gb otherwise - 5 million columns and 100 rows).

r <- 100
c <- 10000
m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c)
m4 <- cbind(m4, 1)
m4 <- bigmemory::as.big.matrix(m4)

I need to remove every column which has only one unique value (in this case, only 0s or only 1s). Because of the number of columns, I want to be able to do this in parallel.

How can I accomplish this while keeping the data compressed as a big.matrix? I can convert it into a df and loop over the columns looking for the number of unique values, but this takes too much RAM.

Thanks!

EDIT: It is bioinformatics as each column is actually a protein subsequence. I am running fisher's exact to select important features, but before that, I must remove features that are present in all samples.

R • 1.2k views

ADD COMMENT • link 6.6 years ago by jackarnestad • 0

0

Entering edit mode

This is purely an R question. How is it bioinformatics?

ADD REPLY • link 6.6 years ago by Ram 44k

0

Entering edit mode

Hello jackarnestad!

We believe that this post does not fit the main topic of this site.

Please tell us how this is related to bioinformatics and we will reopen the question.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link 6.6 years ago by Ram 44k

0

Entering edit mode

I addressed the bioinformatics aspect in my edit. Thanks!

ADD REPLY • link 6.6 years ago by jackarnestad • 0

0

Entering edit mode

Thanks for clarifying. This is indeed a question applied to bioinformatics, but R questions like this might get a quicker answer at bioconductor support or stackoverflow. But you can still be lucky that someone here can help you, so let's wait a bit before cross posting...

ADD REPLY • link 6.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Could you include the package where big.matrix is defined in your code

ADD REPLY • link 6.6 years ago by russhh 5.7k

0

Entering edit mode

Added it to the code, bigmemory

ADD REPLY • link 6.6 years ago by jackarnestad • 0