Question

Change a dataset of variants into binary form?

0

Entering edit mode

24 months ago

pearl2070 ▴ 10

I have a table of protein data that looks like this:

ASV1     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  A  Q  
ASV4     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  D  Q 
ASV3     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  A  Q
ASV2     K  L  A  G  G  S  A  Q  S  I  K  M  P  A  N  Q
ASV5     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  A  Q

I want to convert this table into a binary form, where the common/usual amino acid at each site is represented by 1 and the other/less common variants are represented by -1, like this:

ASV1     1 1 1 1 1 1 1 1 1 1 1 
ASV4     -1 1 1 1 1 1 -1 1 1 1 1
ASV3     1 1 1 1 1 1 -1 1 1 1 1 
ASV2     1 1 -1 1 1 1 1 1 1 1 1 
ASV5     1 1 1 1 1 -1 1 1 1 -1 1

Is there any way to do this in R?

SNPs variants format binary R • 1.5k views

ADD COMMENT • link updated 23 months ago by zx8754 12k • written 24 months ago by pearl2070 ▴ 10

2

Entering edit mode

Can you explain your example more? For example, why -1 is in the first position of ASV4?

ADD REPLY • link 23 months ago by mohammadhassanj ▴ 260

0

Entering edit mode

My mistake, I didn't base the example binary pattern of the ASVs on the example sequence data when typing them out, so they don't match up.

ADD REPLY • link 23 months ago by pearl2070 ▴ 10

0

Entering edit mode

You can edit the question so that others can help you better

ADD REPLY • link 23 months ago by mohammadhassanj ▴ 260

GenoMax · Answer 1 · 2023-01-14

0

Entering edit mode

23 months ago

Hamid Ghaedi 3.3k

Edited 2023-01-16 Not sure if there is any but the following function should work:

binaryVar = function(vR, vV){ #vR: the ref string, vV: the variant string
  vRef = unlist(strsplit(vR, " ")) #Spliting string to its elements
  vVariant = unlist(strsplit(vV, " ")) #Spliting string to its elements
  element_count = length(vRef) # function assumes both vRef and vVariant have the same length
   binary <- c()
  for(index in 1:element_count){
    binary[index] = ifelse(vRef[index] == vVariant[index], "1", "-1")
  }
  return(binary)
}


#Test
ASV1 = c('K L A G G S A Q A I K M P A A Q')
ASV4 = c('K L A G G S A Q A I K M P A D Q')

asv4_binary = binaryVar(ASV1, ASV4)
asv4_binary
[1] "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "-1" "1"

If you have a data frame, you may use this function and iterate over the rows and record the output for each iteration into a new list, then convert the list to a data frame.

ADD COMMENT • link 23 months ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Thanks! But the test is returning an error when I run it:

asv4_binary = binaryVar(ASV1, ASV4)
Error in binaryVar(ASV1, ASV4) : object 'binary' not found

ADD REPLY • link updated 23 months ago by GenoMax 148k • written 23 months ago by pearl2070 ▴ 10

0

Entering edit mode

Ops! just updated the function and now it should work. The issue was because of not having defined 'binary' variable before running the loop.