Entering edit mode
22 months ago
smrutimayipanda
▴
20
I have a text file in which the contents are the following:
3 synonymous_variant
1 missense_variant
1 EFFECT
1 downstream_gene_variant
6 missense_variant
2 upstream_gene_variant
2 synonymous_variant
1 EFFECT
1 downstream_gene_variant
4 missense_variant
3 synonymous_variant
1 upstream_gene_variant
1 EFFECT
1 downstream_gene_variant
3 synonymous_variant
3 missense_variant
1 EFFECT
4 synonymous_variant
3 missense_variant
1 EFFECT
1 downstream_gene_variant
6 missense_variant
1 synonymous_variant
1 EFFECT
1 downstream_gene_variant
3 missense_variant
1 EFFECT
1 downstream_gene_variant
4 synonymous_variant
4 missense_variant
1 EFFECT
2 missense_variant
1 upstream_gene_variant
from this, I need the following result:
missense_variant its total
downstream variant its total
upstream variant its total
....etc
I tried it but did find correct result. Can anyone please tell me how to do it in python or shell or any other language? Thanks in advance!
What have you tried? This should be straightforward in awk. With R, this should be even simpler.
I tried with python but it was giving me total of all variants. Can you please tell me how to do it using awk?
What did you try with python? Did you make a dict from column two and then sum column 1 for each unique column 2 key?
I did this:
Please give the command in awk. It would be really helpful.
No. It's a good exercise for you. Search online on how to use awk dictionaries.
Please let others comment on this. Thanks for your time.
I'm not stopping anyone from commenting - most people are ignoring the post, I'm simply taking the time to tell you that you're better off following a certain path.
You have the columns inverted - shouldn't you be doing
value, name = line.strip().split()
?