Hello,
I've mapped my sample using bowtie2 with the following command:
bowtie2 -p 10 -x /home/carolgs/Cpapaya/bowtieIndex/Cpapaya_113_r.Dec2008.fa.index -1 filtered_PE1.fastq -2 filtered_PE2.fastq -S map.fastq.sam --very-sensitive-local
Now I need to count the reads. I'm trying to run HTSeq-count but it isn't working. I have a .sam file and the annotation file used while mapping is a gff3 without exons information. I'm using this command line:
samtools flagstat map.fastq.sam > flagstat
samtools view -Sb map.fastq.sam > mapped.bam
samtools sort -o SAM mapped.bam > sorted.sam
python -m HTSeq.scripts.count -s yes -r pos -i ID -m intersection-nonempty -q sorted.sam /home/carolgs/Cpapaya/annotation/Cpapaya_113_ASGPBv0.4.gene.gff3 > counts_b.txt
And the error message I get in the log file is
" [Exception type: StopIteration, raised in count.py:126]
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_flagstat_core] Truncated file? Continue anyway.
[samopen] SAM header is present: 5901 sequences.
open: No such file or directory
[bam_sort_core] fail to open file SAM
Warning: No features of type 'exon' found.
Error occured when reading beginning of SAM/BAM file.
I'm not sure about the feature I have to put at -i, I've tried with ID and Name and both of them gives me the same error message. Does anybody have an idea about where is my mistake?
Thanks,
With a reasonable recent version of samtools you can simplify your commands substantially:
instead of
you can use
which will perform sorting and conversion to bam
Great tip, I'm gonna improve it to my command. Thank you!
You must be missing header in your BAM file. What does
samtools view -H your.bam
produce?At the beggining:
And the last lines:
That looks good. I see that you had converted your bam file back into SAM for sorting. What does the header on sorted SAM file look like? You could just do
head -15 sorted.sam
.There is this additional error in your log:
No features of type 'exon' found.
Have you tried to use
featureCounts
(http://bioinf.wehi.edu.au/featureCounts/ ) with the sorted BAM file? It is fast and easier to use.This
head -15 sorted.sam
gives me nothing back.About this additional error, I thought it was because the annotation file I used has no exon information so I shoul tell it to htseq using
-i ID
, although I'm not sure it's really ID the name and I don't know how to check it.I'll take a look at this program. Thank you!
Suggesting that the file is just empty, you may want to repeat the sorting step.
Hi GenoMax, would you please explain why my output is different?
Looks like your file is missing SAM/BAM header. Did you not use an appropriate option (-h or -H) to keep the header?
GenoMax Thank you for your help!
Error occurred when reading beginning of SAM/BAM file. file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False [Exception type: ValueError, raised in libcalignmentfile.pyx:1000]
Would you please help me to know what wrong here?
Please use
instead of the line you have.
GenoMax Yes, I can view the header with:
It is a follow-up question in which I try assemble gene expression from aligned reads with
and I got the error above.
Update. I understood. It worked. Thank you!
What version of
samtools
are you using?