Entering edit mode
9.1 years ago
MatTesla
•
0
Hi all,
I am sure it is a noob question but have some data from different replicates but in some cases I could measure values only in some cases.
How can I select only rows that have values different from zero in at least 3 columns (replicates)?
Unfortunately I am not sure how to do this yet.
An example of the data:
Cbx5 0 19 0 11
Vcl 0 19 0 0
U5-ENSMUSG00000064430 0 18 74 168
Tmbim6 0 18 7 5
Prcp 0 18 2 0
Ncl 0 18 0 1
U4-ENSMUSG00000064606 0 18 0 0
AY172335.5 0 170 38 98
Mest 0 17 5 10
Cdh1 0 17 4 6
Spint2 0 17 2 8
Uggt1 0 17 2 0
Acly 0 17 2 0
Thank you for your help.
Nice go at a one-liner solution, but I think this will give a different answer, it will print if fewer than three of the replicates have a '0' value.
To find only rows with at least three replicates that are non-zero, this awk command should work:
OP asked "values different from zero in at least 3 columns"
My solution counts zeros, if there are less than 2 (e.g. at least 3 fields have non-zero value), the line is printed.
Your solution counts non-zeros. If there are 3 or more, the line is printed.
I think with OP's data we arrive at the same exact output with both strategies (now that I edited the answer).
There are four data columns, so less-than-three could be two, which means that only two have non-zero values. I'm assuming (and you did too in your for loop) that OP didn't mean to count the label column as a non-zero.
Haha, I'm too tired. You're right. I fixed it :)