Hi,
I've got a table with ranked miRNAs from different samples.
sample1 sample2 sample3 sample4 sample5
1 mmu-mir-21a mmu-mir-140 mmu-let-7i mmu-let-7i mmu-mir-218-2
2 mmu-mir-143 mmu-let-7i mmu-mir-27b mmu-let-7f-2 mmu-mir-143
3 mmu-let-7f-2 mmu-mir-143 mmu-let-7f-2 mmu-mir-140 mmu-let-7i
4 mmu-mir-206 mmu-mir-378 mmu-mir-22 mmu-let-7g mmu-mir-218-1
5 mmu-mir-27b mmu-mir-99b mmu-mir-143 mmu-mir-22 mmu-mir-7a-1
...and would like to make a ranked summary file over the whole dataset to know which miRNAs are the most represented over all samples
I guess one can call it the sum of rank numbers per each miRNA: e.g. for mmu-let-7i it would be 0+2+1+1+3=7 ; for mmu-let-7f-2 it would be 3+0+3+2+0=8 etc.
Any ideas how to do that?
Interesting question, there is probably a one liner available with dplyr or something. But if I had to do this simple, I would first add a new column called rank.
then take the rank of each mir.
and so on for each sample.
Finally add them all up.
It will work only of course if all samples have exactly the same mirs.