Similar to this post, I want to filter out all the rows that contain zero value at all columns. I have a file with transcript counts for each sample+replicate and it turns out that some transcripts have 0 counts for all samples and replicates, and in other cases only one sample does not have zero counts but all the rest of the samples do, so what I want to do is to filter out:
- All transcripts where there is zero counts for all samples and replicates
- All transcripts where there is zero counts for all samples except one (e.g., A and B but not C, A and C but not B, B and C but not A)
For example, input:
A_rep1 A_rep2 B_rep1 B_rep2 C_rep1 C_rep2
s1 0 6 5 3 0 9
s2 66 0 5 32 8 0
s3 0 0 0 0 0 0
s4 8 22 0 4 5 5
Output of task 1):
A_rep1 A_rep2 B_rep1 B_rep2 C_rep1 C_rep2
s1 0 6 5 3 0 9
s2 66 0 5 32 8 0
s4 8 22 0 4 5 5
I've been trying in a number of ways to automate the process instead of doing it manually in Excel buy filtering. So my first attempt was in R. For the first task it works well but then when I need to parse the file to process the other tasks it doesn't work.
data=read.table('genes.counts.matrix', header=T)
set1 <- as.matrix(data[,-1])
row.names(set1)<- data[,1]
all <- apply(set1, 1, function(x) all(x[1:16]==0))
newdata <- set1[!all,]
write.table(newdata, "genes.counts.matrix.modified", sep="\t")
Also my problem here is that the output places the headers from column1 but the headers should go on top of the counts and not start at the transcript column. It looks like this
A_rep1 A_rep2 B_rep1 B_rep2 C_rep1 C_rep2
s1 0 6 5 3 0 9
Then I tried with a oneliner perl but it is not working
perl -a -nle 'print if "$F[1-16] != 0" ' genes.counts.matrix > genes.counts.matrix.modified
or
perl -a -nle 'print if "$F[1]:$F[16] != 0" ' genes.counts.matrix > genes.counts.matrix.modified
My idea is to filter out first when all rows are equal to zero, next when rows from 1:12 are equal to zero, next when rows 1:4 and 9:16 are equal to zero, next when rows 1:8 and 12:16 are equal to zero and finally when rows 5:16 are equal to zero
This was my attempt in R and it didn't work
> all <- apply(set1, 1, function(x) (all(x[1:16]==0) | all(x[4:16]==0) | all(x[1:12]==0) | (all(x[1:4]==0) & all(x[9:16]==0)) | (all(x[1:8]==0) & all(x[12:16]==0))))
> newdata <- set1[!all,]
Linu
Yes!! your bracket changes worked!! I still get the headers moved to the left but I am more than satisfied with having the function to work. Thanks a million!!
Illinu I have edited my answer to account for the problem with the header.
This is not working, I have two outcomes:
This one applies the functions but messes up the headers
This one returns the file with the header in place but it does not apply the function
Umm, of course it will not apply the function! the order of r commands is incorrect. Apply the function first, then use cbind() and then write it! My bad, I used 'set1' as the name of file to write out instead of 'newdata'.
I have made that change. Just follow the order like I have shown.
Excellent, that worked!! thanks a million