I have a .bam file that greater than 7 Gb - contains human RNASeq data. I used the easyRNASeq package for processing it (I only want the read count table), but the code execution is stucked, because of the memory limit. I don't know, how to manage processing such big .bam file with the easyRNASeq function. Is there any parameter settings that might help?
count.table <- easyRNASeq( filesDirectory=getwd(),
filenames=bamfiles,
organism="Hsapiens",
chr.sizes=seqlengths(Hsapiens),
gapped = TRUE,
annotationMethod="biomaRt",
format="bam",
count=c('genes',
summarization=c("geneModels"))
write.table(count.table, file="sample_readcount.txt",sep="\t",row.names=FALSE)
Thanks in advance!
As an aside, it's crazy that it's loading the whole BAM into memory. There's no need to do that, at least for this simple region-counting analysis. Many (hopefully most) of the other tools to do this will process the reads via streaming and have minimal memory requirements. So I'd recommend switching to another tool, like htseq-count or bedtools multicov.
That's more of an R issue than an easyRNAseq issue. R has the unfortunate habit of handling alignments by first reading them all into memory (likely in part due to the poor performance of loops).
I just thought that this package (or this function) can solve the .bam file reading in a more sophisticated way. thanks!
I knew the htseq-count, but in my research team we want to try all modules/programs for calculating the read count. I'll check the bedtools multicov, thanks!
Could the BAM file be split up and worked on in smaller pieces?