Question

how calculate different amino acids in a aligning format?

0

Entering edit mode

13 months ago

star ▴ 350

I have a protein alignment data table like the one below. I would like to know how to calculate the number of differences for each amino acid position for query 1 vs other queries.

for example: the protein sequence starts with "ME" and finishes with "HL". in position 5 there is a difference between this query which is "M" compared to query 1, which is "V" . Then I would expect a data frame like :

df <- data.frame(difference=c(0,0,0,0,1,........))

Input:

          query                                                                               amino_acids
1   lcl|Query_10001                              MEKIVLLFAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKKHNGKLCDL
2   lcl|Query_10002                              MEKIVLLLSVVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDL
3   lcl|Query_10003                              MEKIMLLLAATGLVKSDHICIGYHANNSTKQVDTIMEKNVTVTHAQDILEKTHNGKLCDL
4                                               
5   lcl|Query_10001                              DGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYIVEKANPVNDLCYPGDFNDYEELKHL
6   lcl|Query_10002                              NGVKPLILKDCSVAGWLLGNPMCDEFISVPEWSYIVERANPANDLCYPGNLNDYEELKHL
7   lcl|Query_10003                              NGVKPLILKDCSVAGWLLGNPMCDEFINVPEWSYIVEKANPANGLCYPGSFNDYEELKHL

Thank you in advance for any help!

R • 415 views

ADD COMMENT • link updated 13 months ago by Ram 44k • written 13 months ago by star ▴ 350

0

Entering edit mode

You're looking for residue level conservation scores, what you've posted here is an XY problem. Unless you need to use R to do this, there are better tools out there.

ADD REPLY • link 13 months ago by Ram 44k