how count different unique letter for each column?
1
0
Entering edit mode
13 months ago
star ▴ 350

I have a table like below. How to count the number of unique different letters in all columns except (column1) for all rows versus row 1.

Input:

              query letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
1 lcl|Query_10001        M        E        K        I        V        L        L        F        A
2 lcl|Query_10002        M        E        K        I        G        K        L        L        S
3 lcl|Query_10003        M        E        K        I        M        L        L        L        A

Output:

            query.  letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
1 lcl|Query_10001        M        E        K        I        V        L        L        F        A
2 lcl|Query_10002        M        E        K        I        G        K        L        L        S
3 lcl|Query_10003        M        E        K        I        M        L        L        L        A
4 differences            0        0        0        0        2        1        0        1       1
R • 500 views
ADD COMMENT
0
Entering edit mode

Read up on XY problems. That's what you're doing here as well as in your previous question here: how calculate different amino acids in a aligning format?

It looks like you wish residue level counts of unique bases (AKA conservation scores), which is not an uncommon problem in the alignment context. Search online - R is not the best way to do this.

ADD REPLY
3
Entering edit mode
13 months ago
ATpoint 85k

You should provide data as dput(), not these pastes. That makes it easier to copy it. Here a simple solution:


data <- data.table::fread(text="              query letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
lcl|Query_10001        M        E        K        I        V        L        L        F        A
lcl|Query_10002        M        E        K        I        G        K        L        L        S
lcl|Query_10003        M        E        K        I        M        L        L        L        A",
                          data.table=FALSE)

r <- apply(data[,2:ncol(data)], 2, function(x) length(unique(x))) - 1
data[4,] <- c("differences", as.numeric(r))

data
            query letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
1 lcl|Query_10001        M        E        K        I        V        L        L        F        A
2 lcl|Query_10002        M        E        K        I        G        K        L        L        S
3 lcl|Query_10003        M        E        K        I        M        L        L        L        A
4     differences        0        0        0        0        2        1        0        1        1
ADD COMMENT

Login before adding your answer.

Traffic: 2570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6