I have a table of protein data that looks like this:
ASV1 K L A G G S A Q A I K M P A A Q
ASV4 K L A G G S A Q A I K M P A D Q
ASV3 K L A G G S A Q A I K M P A A Q
ASV2 K L A G G S A Q S I K M P A N Q
ASV5 K L A G G S A Q A I K M P A A Q
I want to convert this table into a binary form, where the common/usual amino acid at each site is represented by 1 and the other/less common variants are represented by -1, like this:
ASV1 1 1 1 1 1 1 1 1 1 1 1
ASV4 -1 1 1 1 1 1 -1 1 1 1 1
ASV3 1 1 1 1 1 1 -1 1 1 1 1
ASV2 1 1 -1 1 1 1 1 1 1 1 1
ASV5 1 1 1 1 1 -1 1 1 1 -1 1
Is there any way to do this in R?
Can you explain your example more? For example, why -1 is in the first position of ASV4?
My mistake, I didn't base the example binary pattern of the ASVs on the example sequence data when typing them out, so they don't match up.
You can edit the question so that others can help you better