Entering edit mode
4.0 years ago
deep771992chanda
▴
40
Hi friends!!! I have a relative abundance table in .tsv format where samples are in columns and rows contain the features (pathways). Something like this reproducible example. (Sorry, I am unable to put the table here in question, thus inserting the link.)
Now, I want to keep features (i.e. pathways) that are present with abundance >0.0001 and present in at least 10% of samples. Can you please tell me how can I do that? I mean can you please suggest a bash command to achieve the purpose?
Thanks, dpc
Hi friend. This would be easier to do in R.
This is not well-defined. Is this mean abundance across all samples?
No, no Kevin. Actually if you add each column you will see a total of 1.00. It means each cell of a column shows relative abundance of corresponding row , i.e. pathway. There is no need to do calculation. The values in a cell itself denote relative abundance, so there is no need to do any calculation or something. Just selective rows to be kept or removed. I want to keep only those rows where at least 10% of its cells contains value
>0.0001
.For example, say, I have 2 rows and 20 columns. The first row contains cells where 2 cells (i.e. 10% of the all cells) have values > 0.0001 but other 18 cells have values <0.0001. We will keep these two rows in our output table. Suppose, the second row has one value >0.0001. This row will be removed because at least 10% of all cells i.e. at least 2 cells should contain values > 0.0001. ThanksJust create a boolean matrix and then count the number of TRUE and FALSE per row. Something like:
..then, go from there.
Even better, this will return a single boolean vector of rows (genes) to keep that have 10% samples with values > 0.0001