Hi...
I have a dataframe with n columns and n rows. I need to find how many rows contains zero raw read count across all column.
Thanks
Hi...
I have a dataframe with n columns and n rows. I need to find how many rows contains zero raw read count across all column.
Thanks
A question with 13k views and no accepted answer, that is unfortunate.
Assuming the object is called y
.
## Example data:
y <- sapply(1:4, function(x) rnorm(500,5,1))
Base R function:
isZero <- base::rowSums(y) == 0
If y
is a matrix you can use matrixStats package which is often faster (for rowSums it is marginal, just wanted to mention the matrixStats package):
isZero <- matrixStats::rowSums2(y) == 0
If y
is a count matrix from e.g. the single-cell world and in e.g. the dgeMatrix (or similar) format (common when working with Bioconductor tools on single-cell data), then use from the Matrix
package:
## Matrix package function
isZero <- Matrix::rowSums(y) == 0
isZero
is a logical vector that contains TRUE for rows where all samples have zero counts and FALSE if not.
You can filter your data to remove the only-zero rows with:
y.filtered <- y[!isZero,]
For the number of rows with only zeros, use either of:
sum(isZero)
For those with not only zeros:
sum(!isZero)
Or both combined:
summary(isZero)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
And what have you tried? Show some effort when asking questions by showing us what you tried and what didn't work.
Have a look at
?rowSums
I deleted gene name column:
after that I used:
But it giving output like this:
I am trying to learn R .... I want the total row number across all column contain zero read count.
help(which)
That's great, but you learn nothing by getting the solution from us. You may not like it, but the only way to learn how to program in R (and any other language) is to fail until you figure it out. It's going to take long, sometimes you will spend hours or days on an issue. But every time you will improve. Motivation: Suck until you don't.
You should write a philosophy book Wouter, really you think necessary answer "try by yourself"? Really? Has been more helpful writing
help(which)
by Devon than you in 2 comments."try by yourself" is a winning philosophy in life... Also, the only reason I didn't write that myself is because Wouter already did!
I'm sorry Devon, my point wasn't to be rude with your kid, my point is: Is it really necessary answer in that way? If it is, why don't program an automatic answer for your forum "Try by yourself"? (although this would rest upvotes to Wouter) not all are "experts", most of people are looking for quick solutions, it is clear that you will not make their thesis or save their jobs answering... or even better, if you're an expert and you are not interested in help inexperts, why not just ignore them?
Whatever, this is a very helpful forum and I think that comments like that one has no any sense. If you don't want to give a quick solution don't do it, give an advice (like
help(which)
). Make this forum one full of good advice, not one full of upvotes for egocentrism.I presume that the reference to WouterDeCoster as "my kid" is an error in translation.
Anyway, at least attempting to figure it out yourself is an essential part of science in general and even more so in bioinformatics in particular. The exact same "what have you tried?" procedure is carried out in every wet-lab I've ever seen in the world. So, it's not even this site where such response are encouraged, but rather all of science where such replies should be expected. If people cannot demonstrate enough curiosity to at least make an effort at coming up with an answer themselves then they should not waste their time pursuing a future in science.
Thanks for the feedback.
If my answer was helpful you should upvote it,
And what if it wasn't?
In contrast to other posts here, you have not contributed at all to this thread. You accuse me of not being helpful, while my first post contained "Have a look at
?rowSums
, which is pretty much what OP needs to fix this issue.I see you made your account quite recently, so welcome to biostars. Perhaps it doesn't make a lot of sense to start criticising others after such a short period of observation. If your observation would be longer you would know that OP has asked tons of questions on this forum, for every step of the work he is doing. He is not doing himself a favor by copying every time what we write.
Therefore, you will notice that on an "open" question such as this one we will either ask for showing what OP tried, or give just a pointer in the right direction, such as suggesting
rowSums()
andwhich
as you can see above.The mere fact that we spend hours of our time here volunteering to help people doesn't really suggest that we don't want to help people. And if you don't like it here, feel free to go elsewhere. You'll find that biostars is the friendliest bioinformatics community.
You shouldn't summarize my answer to "try by yourself". Nobody can learn you how to ride a bike by showing you pictures of people on a bike. You have to get on the damn bike yourself and hit the ground often before you acquire the skill of driving a bike.
Thanks a lot for encouraging comment especially WouterDeCoster. I do not take his advise as a negative sense because I am in learning stage and really we learn thing better way when we try to resolve problem by ourselves, but sometime help is also needed.
Thanks
I'm cleaning up a few comments here as the discussion seems to have gotten heated and out of hand.
I would prefer the
rowSums
function that doesn't require any information about your row names. Suppose you have a data frame calleddf
.You could run this:
I usually use this in the Deseq2 package to filter out genes with low expression.
I've moved your post to comment for the reasons above.