You definitely do not have to load the entire BAM file into R if you just want the reads from one chromosome.
The Rsamtools package lets you do this by properly configuring the which
parameter in a call to ScanBamParam
, with a subsequent call to scanBam
. See the top of page 2 for its intro vignette. Note that you will need to know how long your chromosome is (so you can put appropriate stop/end coords).
I've been building a package (SeqTools)myself that is the result of refactoring some code out of different analyses I've been doing that makes this easier ... it even reads the length of chromosomes from the header of the bam file itself.
Essentially, after you install it, it will work as below -- bam.file
is the path to the bam file you want to read from:
R> library(SeqTools)
R> reads.1 <- getReadsFromSequence(bam.file, 'chr1')
reads.1
will be a GRanges object with all your reads (and some meta information about them) from chromosome 1.
If you want to install it, you'll have to d/l or checkout that project and install the R/pkg
folder into R. You can do that from the command line:
$ cd to/project/base/R
$ R CMD INSTALL pkg
That should go smoothly as long as you have (i) the required dependencies (see the DESCRIPTION file), and (ii) Make sure you have the required dependencies ..
There's lots of stuff in there and little documentation, so use at your own risk :-)
Steve, that's an awesome reply, I'll try both packages and let you know my results. Thanks for taking the time to share this information!