Hi all
I have been trying to use HT-seq count on paired end RNA-seq data but have been running into a memory error, which seems to be to do with ht-seq not the directory. I was wondering whether anyone has a solution to this? I am using python 2.7.1, and the input is sorted by position, however I have also tried sorting by name to no avail.
Command:
htseq-count --mode=union --stranded=yes --order=pos Mutant1_align_filtered_sorted.sam genes.gtf > list
Output:
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
671983 GFF lines processed.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
300000 SAM alignment record pairs processed.
400000 SAM alignment record pairs processed.
500000 SAM alignment record pairs processed.
600000 SAM alignment record pairs processed.
700000 SAM alignment record pairs processed.
800000 SAM alignment record pairs processed.
900000 SAM alignment record pairs processed.
1000000 SAM alignment record pairs processed.
1100000 SAM alignment record pairs processed.
1200000 SAM alignment record pairs processed.
1300000 SAM alignment record pairs processed.
1400000 SAM alignment record pairs processed.
1500000 SAM alignment record pairs processed.
1600000 SAM alignment record pairs processed.
1700000 SAM alignment record pairs processed.
1800000 SAM alignment record pairs processed.
Error occured when processing SAM input (line 5693558 of file Mutant1_align_filtered_sorted.sam):
[Exception type: MemoryError, raised in _HTSeq.pyx:1398]
Thanks!
While you wait for someone to provide a solution I suggest that you give featureCounts a try. It is much faster and will take sorted or unsorted BAM/SAM files.
Hi genomax
Thanks for the tip. Whilst featureCounts did work much faster and the count worked fine, I cant't figure out how to then input the count output file into DESeq2 downstream. I've tried some thing along this line but to be honest I really can't figure it out :
Read the counts with
counts <- read.table()
and examine the data withsummary(counts)
. With some luck the problem will stand out easily.