Filter coverage in R
1
0
Entering edit mode
2.6 years ago
msening • 0

Hi, I am using multiple Bam files to get the coverage for each of them.

file1 <- BamFile("bamfile.sorted.bam")
x<-readGAlignments(file1)
xcov<-coverage(x)

And then I wanted to draw the coverage plots:

lapply(xcov, function(x) {
  plot(as.numeric(x), type="l")
})

But I have up to 3000 segments in xcov (the RleList object), so I would like to filter them and keep the ones that have more than 1 run. For example, I don't want these in my coverage plots:

RleList of length 3138

    $seg1
    integer-Rle of length 10731 with 1 run
      Lengths: 10731
      Values :     0

    $seg2
    integer-Rle of length 2789 with 1 run
      Lengths: 2789
      Values :    0

I tried to do this:

lapply(xcov, function(x) {
  plot(as.numeric(x>0), type="l")
})

But it didn't do what I wanted it to do, is there any other way to fix this?

coverage alignment R • 737 views
ADD COMMENT
2
Entering edit mode
2.6 years ago

I think you can filter your xcov list like so:

#create sample data
xcov <- list(seq1=rle(rbinom(100,10,0.05)),
             seq2=rle(rep(0, 100)))

# remove the seqs with only 0 values
xcov2 <- xcov[sapply(lapply(xcov,"[[",2),function(x){!all(x==0)})]

# plot the remainders
lapply(xcov2, function(x) {
  plot(inverse.rle(x), type="l")
})
ADD COMMENT
0
Entering edit mode

Hi, I tried it but it gave me an error like this:

Error in h(simpleError(msg, call)) : error in evaluating the argument 'i' in selecting a method for function '[': error in evaluating the argument 'X' in selecting a method for function 'sapply': this S4 class is not subsettable

I tried your xcov and it works, but my xcov is different, which is probably because I read it from a bam file. I just don't really understand the error and wonder if you could help me with it?

ADD REPLY
0
Entering edit mode

True. It is a package specific derivative data type. But actually it is even easier with this than a generic RLE list.

xcov_reduced <- xcov[sapply(xcov,function(x){as.logical(sum(as.numeric(x)))})]

I am using sapply to apply a function to each element of xcov. I am converting the RLE it back to the numeric vector (notice that the function used here is as.numeric, which only works because it is redefined by the package, usually you would need to use inverse.rle for this). Then I calculate the sum, which is 0 for uncovered segments (having no runs) and a positive number for all others. This information needs to be translated to a logical TRUE (keep) and FALSE (delete) pattern. as.logical conveniently does exactly this, but a sum(as.numeric(x)) > 0 would do exactly the same. The output of this is then directly used to subset the original data xcov[...].

#A full minimal example:

library("GenomicAlignments")
bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools",
                       mustWork=TRUE)
gal1 <- readGAlignments(bamfile)
xcov <- coverage(gal1)

xcov_reduced <- xcov[sapply(xcov,function(x){as.logical(sum(as.numeric(x)))})]  
ADD REPLY

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6