Calculating ratios by group in R
1
1
Entering edit mode
5.3 years ago
Sam ▴ 20

Hi there,

I would like to calculate ratio of NN (total markers/ total NN) from different groups (here 6 sample A and B-groups) in R. It must be easy but I couldn't any example online.

I have found some similar answer but as a newbie to R, I couldn't annotate the code. https://stackoverflow.com/questions/48555851/adding-a-row-for-the-ratio-of-two-variables

        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6 
     M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC                                          
     M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  
     M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN 
     M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN

expected output

    A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6   A-ratio B-ratio A+B-ratio
 M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC  -   -   - 
 M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  0   -   11                                  
 M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN  1.5 0.7 1.4                                
 M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN  0   4   1.0

Thanks for your help.

R SNP • 2.2k views
ADD COMMENT
4
Entering edit mode

I don't understand what A-ratio, B-ratio, A+B-ratio are supposed to represent. Can you spell out how you arrived at the values in the last three columns of the second row?

ADD REPLY
3
Entering edit mode
5.3 years ago
zx8754 12k

I am guessing we are trying to get missingness per sample and overall, try this example:

# example data
df1 <- read.table(text = "        A1  A2  A3  A4  A5  A6  B1  B2  B3  B4  B5  B6 
     M1 CC  CC  AC  AA  CC  CC  CC  AA  AC  CC  CC  CC                                          
     M2 NN  AA  AA  AC  AA  AA  AA  AA  AA  AA  AA  AA  
     M3 AA  AA  NN  NN  AA  AA  GG  NN  GG  GG  NN  NN 
     M4 NN  NN  NN  AA  AA  NN  AA  AA  AA  AA  NN  NN", header = TRUE, stringsAsFactors = FALSE)

x <- colnames(df1)
cbind(df1, 
      sapply(c("A", "B"), function(i){
        d <- df1[ grepl(paste0("^", i), x) ]
        rowSums(d == "NN")/ncol(d)
        }),
      AB = rowSums(df1 == "NN")/ncol(df1)
      )
#    A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6         A         B         AB
# M1 CC CC AC AA CC CC CC AA AC CC CC CC 0.0000000 0.0000000 0.00000000
# M2 NN AA AA AC AA AA AA AA AA AA AA AA 0.1666667 0.0000000 0.08333333
# M3 AA AA NN NN AA AA GG NN GG GG NN NN 0.3333333 0.5000000 0.41666667
# M4 NN NN NN AA AA NN AA AA AA AA NN NN 0.6666667 0.3333333 0.50000000
ADD COMMENT
2
Entering edit mode

That's what I was thinking also but the output of this doesn't match the presented expected output. Waiting for OP to clarify.

ADD REPLY
0
Entering edit mode

Awesome...thanks a lot @zx8754, it worked well. I will make to use dput format for future requests.

Friederike and Jean-Karim Heriche - sorry that the ratios were wrong (for missingness) in my expected output because it was from my entire data set (180x35000) as I subsampled but forgot to calculate the ratio for this subsample.

ADD REPLY
0
Entering edit mode

If it was helpful consider accepting as an answer - "tick".

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6