Question

easyRNASeq memory limit problem

0

Entering edit mode

10.7 years ago

anon ▴ 50

I have a .bam file that greater than 7 Gb - contains human RNASeq data. I used the easyRNASeq package for processing it (I only want the read count table), but the code execution is stucked, because of the memory limit. I don't know, how to manage processing such big .bam file with the easyRNASeq function. Is there any parameter settings that might help?

count.table <- easyRNASeq( filesDirectory=getwd(),
    filenames=bamfiles,
    organism="Hsapiens",
    chr.sizes=seqlengths(Hsapiens),
    gapped = TRUE,
    annotationMethod="biomaRt",
    format="bam",
    count=c('genes',
    summarization=c("geneModels"))
write.table(count.table, file="sample_readcount.txt",sep="\t",row.names=FALSE)

Thanks in advance!

bam read-count easyRNASeq bioconductor • 2.2k views

ADD COMMENT • link updated 3.3 years ago by Ram 44k • written 10.7 years ago by anon ▴ 50

Ram · Answer 1 · 2014-05-01

1

Entering edit mode

10.7 years ago

Neilfws 49k

The documentation (PDF) for easyRNAseq states that "memory will scale linearly with the number and size of read libraries (e.g. bam files)". So you need roughly the same memory as BAM file size, then a little more.

I don't think there's a fix, other than to work on a machine with enough RAM.

ADD COMMENT • link 10.7 years ago by Neilfws 49k

1

Entering edit mode

As an aside, it's crazy that it's loading the whole BAM into memory. There's no need to do that, at least for this simple region-counting analysis. Many (hopefully most) of the other tools to do this will process the reads via streaming and have minimal memory requirements. So I'd recommend switching to another tool, like htseq-count or bedtools multicov.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 10.7 years ago by matted 7.8k

1

Entering edit mode

That's more of an R issue than an easyRNAseq issue. R has the unfortunate habit of handling alignments by first reading them all into memory (likely in part due to the poor performance of loops).

ADD REPLY • link 10.7 years ago by Devon Ryan 105k

0

Entering edit mode

I just thought that this package (or this function) can solve the .bam file reading in a more sophisticated way. thanks!

ADD REPLY • link 10.7 years ago by anon ▴ 50

0

Entering edit mode

I knew the htseq-count, but in my research team we want to try all modules/programs for calculating the read count. I'll check the bedtools multicov, thanks!

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 10.7 years ago by anon ▴ 50

0

Entering edit mode

Could the BAM file be split up and worked on in smaller pieces?

ADD REPLY • link 10.7 years ago by Alex Reynolds 36k