Entering edit mode
4.8 years ago
rgescudero
▴
30
I want to filter out lines having zero values in more than 70% of the columns. Imagine I have the following “test_awk.txt” file
id sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
gene1 1 2 3 4 5 6 0 0 0 0
gene2 0 0 0 0 0 0 0 0 0 0
gene3 0 0 0 0 1 0 0 0 0 0
gene4 0 0 0 10 0 10 0 0 0 0
gene5 0 0 0 0 0 0 0 0 0 0
gene6 10 10 9 9 9 9 9 9 9 9
gene7 8 8 8 8 8 8 8 8 8 8
gene8 0 0 0 0 1 1 1 0 0 0
gene9 0 0 0 0 0 1 1 1 1 1
I would like to remove lines like “gene2”, “gene3”, “gene4”, “gene5”, and “gene8” because they have zero values in more than 7 coulmns out of 10. My reallife file is to big to run it in R, so I’m trying to use "awk" but I’m getting stack Any help would be much appreciated
Ramon
To make sure that 0 in gene names is not counted, I added gene10 entry copying gene 9 values and changing gene9 to gene 10.