I need help with visualising a .bam file.
So, what I did up to now was:
I had one .fastq
of 3.9 MB file containing many reads generated by Metrichor (I am sequencing using the Nanopore MinION). I aligned the reads to a reference .fasta
sequence using the package ngmlr
. I got a .sam
file of 4.3 MB. The command was:
ngmlr -t 4 -r ref.fasta -q reads.fastq -o test.sam -x ont
I converted this .sam
file to a .bam
file using samtools
. I got a .bam
file of 2 MB. The command I used was:
samtools view -bT ref.fasta test.sam > test.bam
I sorted the .bam
file with samtools
. The command was:
samtools sort test.bam > test_sorted.bam
I indexed the .bam
file using samtools
. I got a .bai
file of the same name of 96 bytes. The command was:
samtools index test_sorted.bam
Now I tried opening the .bam
file in IGV_2.4.10
. But I received the following error message:
Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file
What did I do wrong?
Ah that would have been great! But same problem... Any other ideas?
The first few lines of my
.sam
file (just before the conversion to.bam
step) looks like that:Does that look OK to you?
Ok so I tried adding something in the
@SQ SN:
header of the.sam
file, as it seems like this is the "sequence name". But it does not look like the conversion to.bam
worked fine. I have this message:Exactly the same number of times as my number of aligned reads, which does not sound very good.
If I continue anyway with sorting and indexing, the error message from IGV is:
I also tried adding
chr20
in the.sam
file header, exact same.So I guess the problem does come from there, but I am not sure how I should name the sequence... Also, to be clear, my reference sequence in
.fasta
I am aligning to is a very short sequence (around 1kb), not a full genome or anything. Is that OK to do?Thanks
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.You should avoid manual tampering with bams.
This won't work, because each mapped read will still have a wrong chromosome name. But i wonder how chromosome names don't match, as apparently you used the same reference at all steps. Did you load the same reference genome on IGV before opening the bam file?
Also, what are the outputs of:
And
I created this reference myself, so it is whatever I called it:
grep ">" ref.fasta | head
After that, the
.fasta
reference line is just a one line sequence of ~ 1k characters in lower case.And
samtools view align.sam | head
And many many lines of alignments. It is not giving the header for some reason, is that what you expected? Just opening
align.sam
in a text editor, I can see the actual header, which looks like:@HD VN:1.0 SO:unsorted
@SQ SN: LN:877
@PG ID:ngmlr PN:nextgenmap-lr VN:0.2.6 CL:ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont
(Notice that there is nothing after
SN:
).Just so I summarise the sequence of commands I am doing right now:
ngmlr -t 4 -r ref.fasta -q reads.fastq -o align.sam -x ont
samtools view align.sam -o align.bam
samtools sort align.bam > sort.bam
samtools index sort.bam
Genomes
>Load genome from file
. And I selectref.fasta
.sort.bam
in IGV.Error loading BAM file: htsjdk.samtools.SAMFormatException: Invalid BAM file header: missing sequence name in file
Does the name of the reference need to match something in the
.sam
file by any chance?