Hi everyone,
I am a newborn in the sequencing fields and I have some problem with HTSeq. I know that there are a lot of posts about this problem but, even if I read all of them, I am not able to fix the problem I have.
I have PE RNA-seq data, aligned with TopHat. I want to use HTSeq-count to create the input for DESeq2 analysis.
Once obtained the accepted_hits.bam
file what I did was:
samtools flagstat accepted_hits.bam > stat.txt
samtools index accepted_hits.bam
samtools view accepted_hits.bam > accepted_hits.sam
sort -k1,1 -k2,2n accepted_hits.sam > accepted_hits_sort.sam
htseq-count -f sam -m union -r pos -s no -t exon -i gene_id accepted_hits_sort.sam gtfFiles/Homo_sapiens.GRCh38.77.gtf > Sort_Pos.txt
This works fine with an accepted_hits.bam file of 3 GB with around 50 million reads mapped, but whan I moved to a 5 GB, 100 million reads mapped file I got this error:
Error occured when processing SAM input (record #35593187 in file accepted_hits_sort.sam):
Maximum alignment buffer size exceeded while pairing SAM alignments.
[Exception type: ValueError, raised in __init__.py:671]
I tried different option:
- increasing the max buffer size of HTSeq-count (as suggest in http://seqanswers.com/forums/showthread.php?t=41531) ----> same error
- sorting the bam per name instead of per position ----> Everything work but the feature I count corresponds to reads instead to fragment (probably for the resons described in the last reply of htseq-count for pair-end RNA-seq
According to the last reply of previous post I try to rename my reads using
(samtools view -H accepted_hits.bam; accepted_hits.bam | awk '{print substr($1,1,length($1)-1),$0}' | sed 's/ [^ ]*//') | samtools view -bSh - > in.samereadname.bam
that return me this error:
So now my questions are:
- Is what I did correct?
- How can I solve the problem with the bigger bam file?
Thank you so much for your responses.
the correct way of proceeding is the second one. unless you have old data, the read name should be the same for both mates. you can check this by looking at your first bam lines. if you indeed see that the two mates end with
\1
and\2
, then you can run that one-liner with the correction of my other comment.Sorry I forgot to report the error obtained from option 3:
the code you used is missing the command:
samtools view
which you have to insert after the first semi-colon. like this:
Thank you so much for your precious help...I will inform you about the results.