Hello,
I was wondering if I can have your help and support for my following question: I have a raw excel file including 417 columns and 8900 rows. The first five columns contain different protein IDs. Column number 6 to 417 (end) contain different experiments. Also, 8900 rows contain proteins (each row is for one individual protein). Column number 6 to 417 is like a matrix which each cell contains a number ranging from 0 to 2000. A snapshot of my dataset is attached at the end of my post.
Now, for each individual raw (protein), I need to calculate as following:
1) First, for columns number 6 to 417, I need to find those columns (experiment) which all their rows are zero.
2) I need to remove such columns with all rows in zero.
3) Then, for each individual raw (protein) except the row header, I need to calculate “number of columns (experiments; from column 6 to 417) with non zero count, and name this column as “Found_experiments” in its header. For example, if a raw/protein has only three columns/experiments with counts of 20, 850, and 300, it should return 3 (not sum of 20, 850, and 300) as only three columns has numbers for that protein.
4) Then, for each row (except header row), I need to calculate “number of columns found with non zero count” from column 6 to 417 (as found in previous step) divided to total number of columns from 6 to 417, and name this column as “Ratio” in its header.
5) Finally, it returns a separate excel file including the same first to five columns (protein IDs) as the original dataset, then with only two extra columns containing the calculations from the steps 3 (name as “Found_experiments”) and 4 (name as “Ratio”).
I need to write the codes for all the above steps in R. As I have basic knowledge in R, I would highly appreciate if you could help me to sort it out. Many thanks for your great help, in advance.
Best regards, Farah
it's hard to understand without seeing a snapshot of your data.
Thanks for your comment. I have attached a snapshot of my data at the end of my post. However, it contains row number 8900 and cc columns up to cc412.
What have you tried?
if you are learning R, consider doing some tutorials first
if you need this for your work/task/thesis, why not using Excel directly?
Thanks for your suggestion. I was wondering if I can have something as below using Excel directly.
Thank you.
check the COUNTIF function
Great. Thanks a lot.