htseq-count: error when reading beginning of SAM/BAM file.
1
0
Entering edit mode
6.6 years ago
cgias ▴ 20

Hello,

I've mapped my sample using bowtie2 with the following command:

bowtie2 -p 10 -x /home/carolgs/Cpapaya/bowtieIndex/Cpapaya_113_r.Dec2008.fa.index -1 filtered_PE1.fastq -2 filtered_PE2.fastq -S map.fastq.sam --very-sensitive-local

Now I need to count the reads. I'm trying to run HTSeq-count but it isn't working. I have a .sam file and the annotation file used while mapping is a gff3 without exons information. I'm using this command line:

samtools flagstat map.fastq.sam > flagstat 
samtools view -Sb map.fastq.sam > mapped.bam
samtools sort -o SAM mapped.bam > sorted.sam
python -m HTSeq.scripts.count -s yes -r pos -i ID -m intersection-nonempty -q sorted.sam  /home/carolgs/Cpapaya/annotation/Cpapaya_113_ASGPBv0.4.gene.gff3  >  counts_b.txt

And the error message I get in the log file is

" [Exception type: StopIteration, raised in count.py:126]
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_flagstat_core] Truncated file? Continue anyway.
[samopen] SAM header is present: 5901 sequences.
open: No such file or directory
[bam_sort_core] fail to open file SAM
Warning: No features of type 'exon' found.
Error occured when reading beginning of SAM/BAM file.

I'm not sure about the feature I have to put at -i, I've tried with ID and Name and both of them gives me the same error message. Does anybody have an idea about where is my mistake?

Thanks,

htseq htseq-count RNA-Seq rna-seq • 6.9k views
ADD COMMENT
1
Entering edit mode

With a reasonable recent version of samtools you can simplify your commands substantially:

instead of

samtools view -Sb map.fastq.sam > mapped.bam
samtools sort -o SAM mapped.bam > sorted.sam

you can use

samtools sort -o sorted.bam map.fastq.sam

which will perform sorting and conversion to bam

ADD REPLY
0
Entering edit mode

Great tip, I'm gonna improve it to my command. Thank you!

ADD REPLY
0
Entering edit mode

You must be missing header in your BAM file. What does samtools view -H your.bam produce?

ADD REPLY
0
Entering edit mode

At the beggining:

@HD VN:1.0  SO:unsorted
@SQ SN:supercontig_0    LN:6239266
@SQ SN:supercontig_1    LN:5418800
@SQ SN:supercontig_2    LN:4646840
@SQ SN:supercontig_3    LN:3706331
@SQ SN:supercontig_4    LN:3897756
@SQ SN:supercontig_5    LN:3349961
@SQ SN:supercontig_6    LN:3337819
@SQ SN:supercontig_7    LN:2695179
@SQ SN:supercontig_8    LN:2727084
@SQ SN:supercontig_9    LN:2795995
@SQ SN:supercontig_10   LN:3250320
@SQ SN:supercontig_11   LN:2969618

And the last lines:

@SQ SN:contig_48203 LN:689
@SQ SN:contig_48208 LN:682
@SQ SN:contig_48211 LN:675
@SQ SN:contig_48213 LN:675
@SQ SN:contig_48225 LN:649
@SQ SN:contig_48241 LN:623
@SQ SN:contig_48249 LN:610
@SQ SN:contig_48254 LN:599
@SQ SN:contig_48276 LN:555
@SQ SN:contig_48289 LN:544
@SQ SN:contig_48297 LN:505
@SQ SN:contig_48300 LN:496
@SQ SN:contig_48349 LN:380
@SQ SN:contig_48372 LN:318
@SQ SN:contig_48381 LN:277
@PG ID:bowtie2  PN:bowtie2  VN:2.2.3    CL:"/opt/bowtie2-2.2.3/bowtie2-align-s --wrapper basic-0 -p 10 -x /home/carolgs/Cpapaya/bowtieIndex/Cpapaya_113_r.Dec2008.fa.index -S map.fastq.sam --very-sensitive-local -1 filtered_PE1.fastq -2 filtered_PE2.fastq"
ADD REPLY
0
Entering edit mode

That looks good. I see that you had converted your bam file back into SAM for sorting. What does the header on sorted SAM file look like? You could just do head -15 sorted.sam.

There is this additional error in your log: No features of type 'exon' found.

Have you tried to use featureCounts (http://bioinf.wehi.edu.au/featureCounts/ ) with the sorted BAM file? It is fast and easier to use.

ADD REPLY
0
Entering edit mode

This head -15 sorted.sam gives me nothing back.

About this additional error, I thought it was because the annotation file I used has no exon information so I shoul tell it to htseq using -i ID, although I'm not sure it's really ID the name and I don't know how to check it.

I'll take a look at this program. Thank you!

ADD REPLY
0
Entering edit mode

This head -15 sorted.sam gives me nothing back.

Suggesting that the file is just empty, you may want to repeat the sorting step.

ADD REPLY
0
Entering edit mode

Hi GenoMax, would you please explain why my output is different?

samtools view ~/protocol_1/data/MT1/Tophat_Out/accepted_hits.sorted.bam

enter image description here

ADD REPLY
1
Entering edit mode

Looks like your file is missing SAM/BAM header. Did you not use an appropriate option (-h or -H) to keep the header?

ADD REPLY
0
Entering edit mode

GenoMax Thank you for your help!

samtools view MT1/Tophat_Out/accepted_hits.sorted.bam | python -m
HTSeq.scripts.count -q -s no - ~/Indexes/Mus_musculus/UCSC/mm10/Genes/genes.gtf >
MT1/MT1.count.txt

Error occurred when reading beginning of SAM/BAM file. file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False [Exception type: ValueError, raised in libcalignmentfile.pyx:1000]

Would you please help me to know what wrong here?

ADD REPLY
1
Entering edit mode

Please use

samtools view -h MT1/Tophat_Out/accepted_hits.sorted.bam

instead of the line you have.

ADD REPLY
0
Entering edit mode

GenoMax Yes, I can view the header with:

samtools view -h MT1/Tophat_Out/accepted_hits.sorted.bam

It is a follow-up question in which I try assemble gene expression from aligned reads with

samtools view MT1/Tophat_Out/accepted_hits.sorted.bam | python -m
HTSeq.scripts.count -q -s no - ~/Indexes/Mus_musculus/UCSC/mm10/Genes/genes.gtf > MT1/MT1.count.txt

and I got the error above.

Update. I understood. It worked. Thank you!

ADD REPLY
0
Entering edit mode

What version of samtools are you using?

ADD REPLY
1
Entering edit mode
6.6 years ago
h.mon 35k

You are mapping RNAseq reads to an eukaryotic genome with Bowtie2. This is not a good idea, as Bowtie2 is not designed to align over splice junctions. Use STAR (as you are trying on your other thread) or HISAT2.

Additionally, this command is wrong, as the resulting file should be a bam file:

samtools sort -o SAM mapped.bam > sorted.sam

Finally, STAR has a parameter --quantMode geneCounts, which will output cunts equivalent to those of HTSeq.

edit: the above samtools sort is doubly wrong, with -o you can't redirect standard output. It should be:

samtools sort -o sorted.bam mapped.bam
ADD COMMENT
0
Entering edit mode

I am mapping just a couple of samples with both programs to see which one is better, guess it really is STAR. Thanks for de correction.

I finally got it working

samtools sort mapped.bam sorted samtools view sorted.bam | htseq-count -s yes -r pos -m intersection-nonempty -i pacid - /home/carolgs/Cpapaya/STARindex2/Cpapaya_113_ASGPBv0.4.gene_exons.gff3 > counts_name.txt

ADD REPLY

Login before adding your answer.

Traffic: 1838 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6