Finding Protein Sequence Coverage With R Or Python
1
1
Entering edit mode
12.9 years ago
Lakshmi ▴ 60

Hello,

I am trying to find protein sequence coverage after pairwise sequence alignment by using the following formula

sequence coverage = (length of aligned residues/length of longest sequence)*100

How can I use this formula in R or in bio python? Is there any package available to calculate proein sequence coverage ?

protein r alignment biopython • 5.0k views
ADD COMMENT
1
Entering edit mode

How did you generate those alignments? For some programs (e.g. BLAST but also others) this kind of information is provided in the output.

ADD REPLY
0
Entering edit mode

@philippe Thanks for your comment. I used R to do pairwise sequence alignment using Pairwisealignment function(seqinr package) and calculated sequence identity by using pid function(Biostrings package)

ADD REPLY
4
Entering edit mode
12.9 years ago

This function will give the sequence coverage

   seq.cov <- function(x){
    if(!class(x)=="PairwiseAlignedFixedSubject"){
      message('Only supports objects from class PairwiseAlignedFixedSubject')
      stop()
    }
    length.string1 <- x@subject@unaligned@ranges@width
    length.string2 <- x@pattern@unaligned@ranges@width
    align.length1 <- x@pattern@range@width
    align.length2 <- x@subject@range@width

    if(length.string1 >= length.string2){
      longest.length <- length.string1
    } else {
      longest.length <- length.string2
    }

    if(align.length1 >= align.length2){
      align.length <- align.length2
    } else {
      align.length <- align.length1
    }

    x.seq.cov <- ( align.length / longest.length ) * 100

    return(x.seq.cov)
  }

Test function:

  library(Biostrings )

  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign1 <- pairwiseAlignment(s1, s2)

  palign2 <-
  pairwiseAlignment(s1, s2,
                    substitutionMatrix =
                    nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))

  s3 <- DNAString("AGTATAGATGATAGATGGTCC")
  s4 <- DNAString("AGTAGATAGATGGATGATAGATA")

  palign3 <- pairwiseAlignment(s3, s4)

  palign4 <-
  pairwiseAlignment(s3, s4,
                    substitutionMatrix =
                    nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))

  seq.cov(palign1)
  seq.cov(palign2)
  seq.cov(palign3)
  seq.cov(palign4)

This should do it.

ADD COMMENT
0
Entering edit mode

@Mischa Thank you so much.

ADD REPLY

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6