Change a dataset of variants into binary form?
1
0
Entering edit mode
22 months ago
pearl2070 ▴ 10

I have a table of protein data that looks like this:

ASV1     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  A  Q  
ASV4     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  D  Q 
ASV3     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  A  Q
ASV2     K  L  A  G  G  S  A  Q  S  I  K  M  P  A  N  Q
ASV5     K  L  A  G  G  S  A  Q  A  I  K  M  P  A  A  Q 

I want to convert this table into a binary form, where the common/usual amino acid at each site is represented by 1 and the other/less common variants are represented by -1, like this:

ASV1     1 1 1 1 1 1 1 1 1 1 1 
ASV4     -1 1 1 1 1 1 -1 1 1 1 1
ASV3     1 1 1 1 1 1 -1 1 1 1 1 
ASV2     1 1 -1 1 1 1 1 1 1 1 1 
ASV5     1 1 1 1 1 -1 1 1 1 -1 1

Is there any way to do this in R?

SNPs variants format binary R • 1.5k views
ADD COMMENT
2
Entering edit mode

Can you explain your example more? For example, why -1 is in the first position of ASV4?

ADD REPLY
0
Entering edit mode

My mistake, I didn't base the example binary pattern of the ASVs on the example sequence data when typing them out, so they don't match up.

ADD REPLY
0
Entering edit mode

You can edit the question so that others can help you better

ADD REPLY
0
Entering edit mode
22 months ago

Edited 2023-01-16 Not sure if there is any but the following function should work:

binaryVar = function(vR, vV){ #vR: the ref string, vV: the variant string
  vRef = unlist(strsplit(vR, " ")) #Spliting string to its elements
  vVariant = unlist(strsplit(vV, " ")) #Spliting string to its elements
  element_count = length(vRef) # function assumes both vRef and vVariant have the same length
   binary <- c()
  for(index in 1:element_count){
    binary[index] = ifelse(vRef[index] == vVariant[index], "1", "-1")
  }
  return(binary)
}


#Test
ASV1 = c('K L A G G S A Q A I K M P A A Q')
ASV4 = c('K L A G G S A Q A I K M P A D Q')

asv4_binary = binaryVar(ASV1, ASV4)
asv4_binary
[1] "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "1"  "-1" "1"

If you have a data frame, you may use this function and iterate over the rows and record the output for each iteration into a new list, then convert the list to a data frame.

ADD COMMENT
0
Entering edit mode

Thanks! But the test is returning an error when I run it:

asv4_binary = binaryVar(ASV1, ASV4)
Error in binaryVar(ASV1, ASV4) : object 'binary' not found
ADD REPLY
0
Entering edit mode

Ops! just updated the function and now it should work. The issue was because of not having defined 'binary' variable before running the loop.

ADD REPLY
0
Entering edit mode

It works now, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6