Hi guys,
I have probably an easy task but my R knowledge is not good enough. I have a column with COG-annotation categories, with some raws having multiple categories:
A
A
B|Q
B|Q
B|Q
R|P|G|E
R|P|G|E
R|P|G|E
I would like to split them (thus removing the | separator which I managed using awk) and then concatenated all the (here) 4 columns in only one, so I can count the total frequency of each category. I said R just because I'm going to make a plot afterwards, but also awk or similar are very welcome. Thanks a lot, S
Can you post what the data.frame looks like currently?
Oh sorry. I edited the previous post. That's one column (called COG_CATEGORY) of a CSV file with many more columns and thousands of raws, I copied just few to give an idea. And that's what I would like:
Ti finally have:
Input:
output:
That's awesome! I should learn/use more often those three commands, thanks a lot!