Counting all samples numbers for every gene
1
2
Entering edit mode
2.4 years ago
Nova ▴ 20

Is there a method to automate the counting of all the numbers for every gene -including all different variants. Thats for large number of genes over 10000 genes.

Variant numbers_cases numbers_control Gene
1 20 55 ABC
2 48 149 ABC
3 20 456 ABC
4 73 55 TET
5 48 77 TET
6 189 454 TET

Out put I want

| Gene | numbers_cases | numbers_control | 
| ------- | --------- | --------| 
|  ABC |88     |660 |
| TET|310     |586 |
R counting awk • 1.0k views
ADD COMMENT
2
Entering edit mode

with datamash

$ datamash -H -sg4 sum 2 sum 3 < test.txt

GroupBy(Gene)   sum(numbers_cases)  sum(numbers_control)
ABC 88  660
TET 310 586

with csvtk

$ csvtk -t summary -w 0 -f 2:sum,3:sum -g 4 test.txt | sed '1s/:sum//g'|csvtk -t -s "|" pretty

Gene|numbers_cases|numbers_control
----|-------------|---------------
ABC |88           |660
TET |310          |586

with R data table:

df=fread("test.txt", header = T)
df[,.(numbers_cases=sum(numbers_cases), numbers_control=sum(numbers_control)),by=Gene]

  Gene numbers_cases numbers_control
1:  ABC            88             660
2:  TET           310             586

with base R:

$ aggregate(.~Gene, df[,-1], sum)

  Gene numbers_cases numbers_control
1  ABC            88             660
2  TET           310             586

R, Datamash and csvtk don't come by default with OS. You may have to install via apt, yum, brew, conda etc.

ADD REPLY
1
Entering edit mode

Tidyverse solution.

library("dplyr")

df |>
  group_by(Gene) |>
  summarize(across(!Variant, sum), .groups="drop")
ADD REPLY
2
Entering edit mode
2.4 years ago
Jeremy ▴ 930

Here is a solution in R:

First load the data.

counting = read.csv('counting.csv')

Then create a list of genes to group by.

gene.list = list(counting$Gene)

Then combine your rows.

new.counting = aggregate(x = counting[c('numbers_cases', 'numbers_control')], by = gene.list, FUN = sum)
new.counting
ADD COMMENT

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6