Separate letters in the same column into different rows
1
0
Entering edit mode
2.4 years ago

Hey everyone,

I have a text file named COGs.txt as follows

COG_category    Element_type    Phylum
LA       Stat     Proteobacteria
E       Stat     Firmicutes
KS       Bact     Proteobacteria
-       Bact     Firmicutes
S       Bact     Firmicutes

My goal here, is to count the number of occurrences each letter is present in column COG_category, and group by Element_type and Phylum. The problem is that some rows have more than one letter in column COG_category. I know I can use something like

grep -o '[A-Z]' COGs.txt | sort | uniq -c > uniq_counts_COGs.txt

This will output the number of occurrences of each letter, but doesn't group the letters by Element_type and Phylum. Maybe using datamash will help? If you group by COG_category, this will group the letters without separating them.

Many thanks!

sequence • 457 views
ADD COMMENT
3
Entering edit mode
2.4 years ago
grep -v '^COG_category' COGs.txt  | awk '{L=length($1);for(i=1;i<=L;i++) {printf("%s\t%s\t%s\n",substr($1,i,1),$2,$3);} }' | sort | uniq -c
ADD COMMENT

Login before adding your answer.

Traffic: 2268 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6