Hi,
I have a CD-hit output looking like this:
## 73 MIXED Sb-40 4 6 66.66666666666667 cluster_stats(length_max=319, length_min=205, length_mean=296.6666666666667, length_variance=2042.2666666666667, length_stdev=45.19144461805428, length_members_max=317, length_members_min=205, length_members_mean=292.2, length_members_variance=2403.2000000000003, length_members_stdev=49.0224438395313, ident_perc_max=97.48, ident_perc_min=85.85, ident_perc_mean=93.986, ident_perc_variance=22.420530000000017, ident_perc_stdev=4.735032206859845, counter=Counter({'Sb-40': 4, 'Sj-A': 2}))
73 0 319 Sj-A_M02764:115:000000000-C3GKK:1:1118:4143:8248 1
73 1 317 Sj-A_M02764:115:000000000-C3GKK:1:2107:8743:9281 1 317 5 319 + 96.85 0
73 3 317 Sb-40_M02764:115:000000000-C3GKK:1:2104:16139:22698 1 317 5 319 + 97.48 0
73 5 317 Sb-40_M02764:115:000000000-C3GKK:1:2115:7096:7098 1 317 5 319 + 94.01 0
73 2 305 Sb-40_M02764:115:000000000-C3GKK:1:1113:14798:13772 1 305 1 319 + 95.74 0
73 4 205 Sb-40_M02764:115:000000000-C3GKK:1:2106:18903:18118 1 205 1 217 + 85.85 0
SJ-A and Sb-40 are two samples, and I want to know how many fragment of each sample I have in each cluster. For exemple for this "73" cluster
# Sj-A Sb-40
73 2 4
How to proceed?
Thanks in advance
Hmmm… I tried and:
## Sb-40 290
## Sj-A 60
0 0 1
0 1 1
0 10 1
0 100 1
0 1000 1
0 1001 1
0 1002 1
0 1003 1
0 1004 1
0 1005 1
etc…
Can you provide the file in the code window? I'm not sure about your field separators (specify them). It seems like single whitespaces.
input file:
Output:
Try this:
Quiet ok
I'll sort that out in a spreadsheet … thanks
Issue: it forgets the 0 values…
Replace [1-9] with [0-9] in awk command provided by 5heikki.
It still omit to say when there is 0
Basically, the final output I need is something like:
I need to know when there is 0 for one plant in a cluster
I recommend you take this to stackoverflow (be sure to describe your problem well)