Hello,
Is there any tools to count occurrence of amino acid in protein sequence like this:
string_count("AAA","AA")#count "AA" in sequence "AAA"
[1] 2 #result, so 2 "AA"
I used stringr package but it gives us this:
library(stringr)
str_count("AAA","AA")
[1] 1 #result, only 1 "AA"
But now I want to give a reasonably higher score for longer peptide. So is there any tools for this?
Thanks a lot
It is not very clear what is your problem and what you want to achieve.
Is the result of
str_count
correct to you ? Or do you wantstr_count("AAA","AA")
to result as 2 countsWhat kind of score do you want to apply ? Could you share an example ?
thanks Bastien, sorry my question was unclear. I want
str_count("AAA","AA")
to result as 2 counts. for instance, "AAAAAA" will give 5 counts and "AARAAGAAN" gives us 3 counts. This counting strategy will give continuous peptide (like "AAAAAA") more counts.See ATpoint 's answer for the count part. And for the score, if you want to play it dirty you can divide the number of count by the peptide length or create your own score strategy using the start and end position in the result of
matchPattern
. Like increase the score tilldf[end] < df[start+1]+1
, or something similarthis is an advice from a real expert! YES, a scoring strategy, this is what i want to do after all exploratory analysis
Hi boaty,
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Thanks!
sorry for it. it's done now