Calculate the coverage of a protein having a list of its peptides
1
0
Entering edit mode
7.2 years ago
arronar ▴ 290

Hello out there.

I was wondering if there is a simple way using R to calculate the coverage of a protein when you have a list of peptides from it and its initial sequence.

For example let's say that we have this protein sequence taken from uniprot:

MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPS
EKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLD
AKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHL
EKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDD
SFRKIYTDLGWKFTPL

and we have a list of some of its peptides that may or may not overlap one an other.

pepts = c("DRRRRMEALLLSLY", "YPNDRKLL", "DYKEWSPPRVQVECPKAPVEWNNPPS
    EKGLIVGHFSGIKYKGEKAQA", "SEVDVNK", "MCCWVSKFKDAMRRYQGIQ", "TCKIPGK", "VLSDLD
    AKIKAYNLTVEGVEGFVRYSRVTK", "DRRRRMEALLLSLYYPNDRKLL" , "SEVDVNKMCCWVSKFK")

Can we somehow to calculate the coverage ?

Thank you.

R protein coverage uniprot • 4.7k views
ADD COMMENT
0
Entering edit mode

While this is not a R solution, have you thought of doing multiple-sequence alignment?

ADD REPLY
0
Entering edit mode

I tried clustal omega but I don't know how to get its results inside R and also it doesn't seem to return a percentage of coverage.

ADD REPLY
0
Entering edit mode

Not my field of work, however I found 2 solutions looking in google. Not tested my end. Try and see if it fits yours.

For MS data : isobar R package does the work, check the pdf

I also found this tool Protein Coverage Summarizer but it's not an R package

ADD REPLY
0
Entering edit mode

Thank you but none of them seem to can help me.

ADD REPLY
3
Entering edit mode
7.2 years ago

Just use regular expressions to match the peptides to the protein sequence and record an X at each matched position. When all of the peptides have been processed, count the Xs.

ADD COMMENT
0
Entering edit mode

Just what I would do. I don't think, there is simpler solution.

ADD REPLY
0
Entering edit mode

I guess that I have to count both the starting and ending position of each match and then sum them up because some of them may be overlap each other.

ADD REPLY
2
Entering edit mode

No need to sum anything. Here is a perl way of doing it:

my $cover_seq = $protein_seq; # copy in which we're going to replace matches by X
foreach my $peptide_seq(@peptides) {
    if ($protein_seq=~/$peptide_seq/) { # peptide matches the protein
        my $start = $-[0]; # start position of match
        my $end = $+[0]; # end position of match
        my $len = $end - $start; # length of the match
        # Replace peptide by Xs in protein sequence
        substr($cover_seq, $start, $len) = 'X' x $len;
   }
}
# Count number of Xs to get coverage
my $coverage = ($cover_seq=~tr/X//)/length($cover_seq) * 100;
ADD REPLY
0
Entering edit mode

Oh I see. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6