Entering edit mode
4.1 years ago
bertb
▴
20
Hello,
I am trying to complete a gene level count of my RNAseq data, and am running into a problem processing my BAM files with htseq.
When I enter the command:
htseq-count --format bam --order pos --mode intersection-strict --stranded no --minaqual 1 --type exon --idattr gene_id $RNA_ALIGN_DIR/UWN.bam $RNA_REF_GTF > UWN_gene.tsv
everything appears to be processing well, but after ~9M reads, the output stops and I get the output "Killed"
...
8100000 SAM alignment record pairs processed.
8200000 SAM alignment record pairs processed.
8300000 SAM alignment record pairs processed.
8400000 SAM alignment record pairs processed.
8500000 SAM alignment record pairs processed.
8600000 SAM alignment record pairs processed.
8700000 SAM alignment record pairs processed.
8800000 SAM alignment record pairs processed.
8900000 SAM alignment record pairs processed.
9000000 SAM alignment record pairs processed.
9100000 SAM alignment record pairs processed.
Killed
I have tried this before, and at the time had a "memory failure", so I upgraded my system. I'm on a VM Ubuntu, now with 24G RAM and 2TB disk space, which I thought would be sufficient.
Any help would be appreciated!
Thanks,
Please double-check how many GBs of RAM are actually allocated to the VM. Alternatively, use
featureCounts
, memory-efficient and blazingly fast.Thanks for your response.
I've checked the allocation, and of the 32GB on my host system, I've allocated 24GB to the VM. Do you think this is sufficient? Is there any reason to choose featureCounts vs htseq-count aside from memory requirements?
Thanks,
With featureCounts you can supply your entire list of BAM files (in order you want read count columns to be) at one time and get a count matrix of counts (rows genes, samples in columns) in one shot. featureCounts will name sort your BAM files automatically and then provide you with counts with minimum fuss.