Question

Finding Protein Sequence Coverage With R Or Python

1

Entering edit mode

13.6 years ago

Lakshmi ▴ 60

Hello,

I am trying to find protein sequence coverage after pairwise sequence alignment by using the following formula

sequence coverage = (length of aligned residues/length of longest sequence)*100

How can I use this formula in R or in bio python? Is there any package available to calculate proein sequence coverage ?

protein r alignment biopython • 5.3k views

ADD COMMENT • link updated 8.4 years ago by Biostar 20 • written 13.6 years ago by Lakshmi ▴ 60

1

Entering edit mode

How did you generate those alignments? For some programs (e.g. BLAST but also others) this kind of information is provided in the output.

ADD REPLY • link 13.6 years ago by Philippe ★ 1.9k

0

Entering edit mode

@philippe Thanks for your comment. I used R to do pairwise sequence alignment using Pairwisealignment function(seqinr package) and calculated sequence identity by using pid function(Biostrings package)

ADD REPLY • link 13.6 years ago by Lakshmi ▴ 60

score 4 · Answer 1 · 2011-12-21

This function will give the sequence coverage

   seq.cov <- function(x){
    if(!class(x)=="PairwiseAlignedFixedSubject"){
      message('Only supports objects from class PairwiseAlignedFixedSubject')
      stop()
    }
    length.string1 <- x@subject@unaligned@ranges@width
    length.string2 <- x@pattern@unaligned@ranges@width
    align.length1 <- x@pattern@range@width
    align.length2 <- x@subject@range@width

    if(length.string1 >= length.string2){
      longest.length <- length.string1
    } else {
      longest.length <- length.string2
    }

    if(align.length1 >= align.length2){
      align.length <- align.length2
    } else {
      align.length <- align.length1
    }

    x.seq.cov <- ( align.length / longest.length ) * 100

    return(x.seq.cov)
  }

Test function:

  library(Biostrings )

  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign1 <- pairwiseAlignment(s1, s2)

  palign2 <-
  pairwiseAlignment(s1, s2,
                    substitutionMatrix =
                    nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))

  s3 <- DNAString("AGTATAGATGATAGATGGTCC")
  s4 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign3 <- pairwiseAlignment(s3, s4)

  palign4 <-
  pairwiseAlignment(s3, s4,
                    substitutionMatrix =
                    nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))

  seq.cov(palign1)
  seq.cov(palign2)
  seq.cov(palign3)
  seq.cov(palign4)

This should do it.