Finding Protein Sequence Coverage With R Or Python
1
Hello,
I am trying to find protein sequence coverage after pairwise sequence alignment by using the following formula
sequence coverage = (length of aligned residues/length of longest sequence)*100
How can I use this formula in R or in bio python? Is there any package available to calculate proein sequence coverage ?
protein
r
alignment
biopython
• 5.1k views
This function will give the sequence coverage
seq.cov < - function( x) {
if( ! class( x) == "PairwiseAlignedFixedSubject" ) {
message( 'Only supports objects from class PairwiseAlignedFixedSubject' )
stop( )
}
length.string1 < - x@subject@unaligned@ranges@width
length.string2 < - x@pattern@unaligned@ranges@width
align.length1 < - x@pattern@range@width
align.length2 < - x@subject@range@width
if( length.string1 >= length.string2) {
longest.length < - length.string1
} else {
longest.length < - length.string2
}
if( align.length1 >= align.length2) {
align.length < - align.length2
} else {
align.length < - align.length1
}
x.seq.cov < - ( align.length / longest.length ) * 100
return( x.seq.cov)
}
Test function:
library( Biostrings )
s1 < - DNAString( "AGTATAGATGATAGAT" )
s2 < - DNAString( "AGTAGATAGATGGATGATAGATA" )
palign1 < - pairwiseAlignment( s1, s2)
palign2 < -
pairwiseAlignment( s1, s2,
substitutionMatrix =
nucleotideSubstitutionMatrix( match = 2, mismatch = 10, baseOnly = TRUE))
s3 < - DNAString( "AGTATAGATGATAGATGGTCC" )
s4 < - DNAString( "AGTAGATAGATGGATGATAGATA" )
palign3 < - pairwiseAlignment( s3, s4)
palign4 < -
pairwiseAlignment( s3, s4,
substitutionMatrix =
nucleotideSubstitutionMatrix( match = 2, mismatch = 10, baseOnly = TRUE))
seq.cov( palign1)
seq.cov( palign2)
seq.cov( palign3)
seq.cov( palign4)
This should do it.
Login before adding your answer.
Traffic: 1946 users visited in the last hour
How did you generate those alignments? For some programs (e.g. BLAST but also others) this kind of information is provided in the output.
@philippe Thanks for your comment. I used R to do pairwise sequence alignment using Pairwisealignment function(seqinr package) and calculated sequence identity by using pid function(Biostrings package)