Question

"Invalid BAM file header: missing sequence name in file" when trying to visualise .bam file in IGV

0

Entering edit mode

6.5 years ago

francois ▴ 80

I need help with visualising a .bam file.

So, what I did up to now was:

I had one .fastq of 3.9 MB file containing many reads generated by Metrichor (I am sequencing using the Nanopore MinION). I aligned the reads to a reference .fasta sequence using the package ngmlr. I got a .sam file of 4.3 MB. The command was:

ngmlr -t 4 -r ref.fasta -q reads.fastq -o test.sam -x ont

I converted this .sam file to a .bam file using samtools. I got a .bam file of 2 MB. The command I used was:

samtools view -bT ref.fasta test.sam > test.bam

I sorted the .bam file with samtools. The command was:

samtools sort test.bam > test_sorted.bam

I indexed the .bam file using samtools. I got a .bai file of the same name of 96 bytes. The command was:

 samtools index test_sorted.bam

Now I tried opening the .bam file in IGV_2.4.10. But I received the following error message:

Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file

What did I do wrong?

alignment bam igv • 5.7k views

ADD COMMENT • link updated 6.5 years ago by WouterDeCoster 47k • written 6.5 years ago by francois ▴ 80

0

Entering edit mode

Ah that would have been great! But same problem... Any other ideas?

The first few lines of my .sam file (just before the conversion to .bam step) looks like that:

@HD VN:1.0 SO:unsorted

@SQ SN: LN:877

@PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont

01c372ae-0d41-4ba4-bd0c-c56498801f53 4 * 0 0 * TGGGACGCTCGTTCAGTTA

Does that look OK to you?

ADD REPLY • link 6.5 years ago by francois ▴ 80

0

Entering edit mode

Ok so I tried adding something in the @SQ SN: header of the .sam file, as it seems like this is the "sequence name". But it does not look like the conversion to .bam worked fine. I have this message:

[W::sam_parse1] urecognized reference name; treated as unmapped

Exactly the same number of times as my number of aligned reads, which does not sound very good.

If I continue anyway with sorting and indexing, the error message from IGV is:

does not contain any sequence names which match the current genome.

File: PRNP, # that's the name I added

Genome: chr1, chr2, chr3, chr4, ...

I also tried adding chr20 in the .sam file header, exact same.

So I guess the problem does come from there, but I am not sure how I should name the sequence... Also, to be clear, my reference sequence in .fasta I am aligning to is a very short sequence (around 1kb), not a full genome or anything. Is that OK to do?

Thanks

ADD REPLY • link 6.5 years ago by francois ▴ 80

1

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

You should avoid manual tampering with bams.

ADD REPLY • link 6.5 years ago by WouterDeCoster 47k

1

Entering edit mode

I also tried adding chr20 in the .sam file header, exact same.

This won't work, because each mapped read will still have a wrong chromosome name. But i wonder how chromosome names don't match, as apparently you used the same reference at all steps. Did you load the same reference genome on IGV before opening the bam file?

Also, what are the outputs of:

grep ">" reference.fasta | head

And

samtools view test.sam | head

ADD REPLY • link 6.4 years ago by h.mon 35k

0

Entering edit mode

I created this reference myself, so it is whatever I called it:

grep ">" ref.fasta | head

106_46_amplicons_ref

After that, the .fasta reference line is just a one line sequence of ~ 1k characters in lower case.

And

samtools view align.sam | head

37d3e25f-875a-4cac-967a-b26d51342613 4 * 0 0 * * 0 0 CACGAAAAAAAA

And many many lines of alignments. It is not giving the header for some reason, is that what you expected? Just opening align.sam in a text editor, I can see the actual header, which looks like:

@HD VN:1.0 SO:unsorted @SQ SN: LN:877 @PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont

(Notice that there is nothing after SN:).

Just so I summarise the sequence of commands I am doing right now:

ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont
samtools view align.sam -o align.bam
samtools sort align.bam > sort.bam
samtools index sort.bam
Changing the reference genome in IGV with Genomes > Load genome from file. And I select ref.fasta.
Drag and drop sort.bam in IGV.
Error message: Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file

Does the name of the reference need to match something in the .sam file by any chance?

ADD REPLY • link 6.4 years ago by francois ▴ 80

score 2 · Answer 1 · 2018-06-15

2

Entering edit mode

6.5 years ago

WouterDeCoster 47k

You can simplify your commands substantially and avoid intermediate files:

zcat reads.fastq.gz | ngmlr --presets ont -t 8 -r genome.fa | samtools sort  -o alignment.bam -

Which should work okay in IGV.

ADD COMMENT • link 6.5 years ago by WouterDeCoster 47k

0

Entering edit mode

True. I don't really know how that would help solve my problem though, it is the same as running the commands sequentially.

ADD REPLY • link 6.4 years ago by francois ▴ 80

0

Entering edit mode

It's not the same as your commands, since you apparently messed with the header. Anyway, I use that command often and it works fine for me, also in igv.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

score 1 · Answer 2 · 2018-06-14

1

Entering edit mode

6.5 years ago

h.mon 35k

I believe you need samtools view -bhT ref.fasta test.sam > test.bam, otherwise the sam header is lost.

-h include header in SAM output

ADD COMMENT • link 6.5 years ago by h.mon 35k

1

Entering edit mode

You only need:

samtools view test.sam -o test.bam

(assuming a reasonably recent version of samtools)

ADD REPLY • link 6.5 years ago by WouterDeCoster 47k

0

Entering edit mode

No, -b implies -h, so this cannot be the issue.

ADD REPLY • link 6.5 years ago by ATpoint 85k