Entering edit mode
2.4 years ago
genomes_and_MGEs
▴
10
Hey everyone,
I have a text file named COGs.txt as follows
COG_category Element_type Phylum
LA Stat Proteobacteria
E Stat Firmicutes
KS Bact Proteobacteria
- Bact Firmicutes
S Bact Firmicutes
My goal here, is to count the number of occurrences each letter is present in column COG_category, and group by Element_type and Phylum. The problem is that some rows have more than one letter in column COG_category. I know I can use something like
grep -o '[A-Z]' COGs.txt | sort | uniq -c > uniq_counts_COGs.txt
This will output the number of occurrences of each letter, but doesn't group the letters by Element_type and Phylum. Maybe using datamash will help? If you group by COG_category, this will group the letters without separating them.
Many thanks!