Question

HT-seq count memory error

0

Entering edit mode

7.5 years ago

s1469060 ▴ 10

Hi all

I have been trying to use HT-seq count on paired end RNA-seq data but have been running into a memory error, which seems to be to do with ht-seq not the directory. I was wondering whether anyone has a solution to this? I am using python 2.7.1, and the input is sorted by position, however I have also tried sorting by name to no avail.

Command:

htseq-count --mode=union --stranded=yes --order=pos Mutant1_align_filtered_sorted.sam genes.gtf > list 

Output:
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
671983 GFF lines processed.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
300000 SAM alignment record pairs processed.
400000 SAM alignment record pairs processed.
500000 SAM alignment record pairs processed.
600000 SAM alignment record pairs processed.
700000 SAM alignment record pairs processed.
800000 SAM alignment record pairs processed.
900000 SAM alignment record pairs processed.
1000000 SAM alignment record pairs processed.
1100000 SAM alignment record pairs processed.
1200000 SAM alignment record pairs processed.
1300000 SAM alignment record pairs processed.
1400000 SAM alignment record pairs processed.
1500000 SAM alignment record pairs processed.
1600000 SAM alignment record pairs processed.
1700000 SAM alignment record pairs processed.
1800000 SAM alignment record pairs processed.
Error occured when processing SAM input (line 5693558 of file Mutant1_align_filtered_sorted.sam):

  [Exception type: MemoryError, raised in _HTSeq.pyx:1398]

Thanks!

RNA-Seq • 3.2k views

ADD COMMENT • link 7.5 years ago by s1469060 ▴ 10

3

Entering edit mode

While you wait for someone to provide a solution I suggest that you give featureCounts a try. It is much faster and will take sorted or unsorted BAM/SAM files.

ADD REPLY • link 7.5 years ago by GenoMax 147k

0

Entering edit mode

Hi genomax

Thanks for the tip. Whilst featureCounts did work much faster and the count worked fine, I cant't figure out how to then input the count output file into DESeq2 downstream. I've tried some thing along this line but to be honest I really can't figure it out :

countsTable <- DESeqDataSetFromMatrix(countData="/Volumes/igmm/hill-lab/Zoe/RNA-seq/E10.5_G2-67/DEseq/FeatureCountAll", colData= colData, design =  ~ genotype)
Error in DESeqDataSet(se, design = design, ignoreRank) : 
  some values in assay are negative

ADD REPLY • link updated 7.5 years ago by GenoMax 147k • written 7.5 years ago by s1469060 ▴ 10

0

Entering edit mode

Read the counts with counts <- read.table() and examine the data with summary(counts). With some luck the problem will stand out easily.

ADD REPLY • link 7.5 years ago by h.mon 35k

score 1 · Answer 1 · 2017-06-01

1

Entering edit mode

7.5 years ago

s1469060 ▴ 10

Just to post the solution to this. I'm not sure why it was a problem seeming as HTseq should work with order specified as pos or name and with sam or bam input, but it seemed to work fine when I changed to bam input format and sorted by name. Not sure why this mattered but it worked!

Thanks. Zoe

ADD COMMENT • link 7.5 years ago by s1469060 ▴ 10

score 0 · Answer 2 · 2017-05-22

0

Entering edit mode

7.5 years ago

h.mon 35k

Although using featureCounts as genomax suggested is better, if you want to use HTseq, I suggest you sort your sam file by name and set --orderaccordingly. This memory error for position-sorted files is known and old.

P.S.: you said which version of Python, but not which version of HTseq. Don't you think this is important as well?

ADD COMMENT • link 7.5 years ago by h.mon 35k

0

Entering edit mode

Hi h.mon

The version of HTseq is version 0.7.2 I have sorted by name and set the parameter order but it still seems to sporadically encounter HTseq issues.

ADD REPLY • link 7.5 years ago by s1469060 ▴ 10