Entering edit mode
9.5 years ago
bioinfo
▴
840
Hi, I have been trying to map over 50 million short (100 bp) reads (referred to as reads.fasta) to 4 reference genes in a file (~1000 bp each) (referred to as reference.fasta).using Bowtie2.
bowtie2-build -f reference.fasta Bowtie.mapping (INDEXING DATAABSE, INDEX NAME)
bowtie2 -x Bowtie.mapping -p 16 -f -U reads.fasta -S file.sam (BOWTIE RUN)
samtools view -bS file.sam > file.bam (SAM TO BAM)
samtools sort file.bam file.bam.sorted (SORTING BAM FILE)
samtools index file.bam.sorted.bam (INDEXING BAM FILE)
The .sam file looks like this. I am not sure whether it is correct or not and few of those fields below.
HISEQ:205:C4GL1ACXX:1:1101:8328:2446 4 * 0 0 * * 0 0 GCCATTCGCGGTTGCAGGGCCTCCATCATTTGCTGTGGCTGCACCGCAGGCGCTTCCTGGAACGTCAACCCTCGTTGCGCCCTCACTGCATCATGCTCCTC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
However, the produced indexed bam file was wrong and shows this message in the .bai file.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "file.sorted.bam.bai".
So, "EOF marker is absent" is a bug in Samtools so not a problem here, but Bam file has no header. Does an extra -h
flag help during SAM to BAM conversion to add the header?
samtools view -bS -h file.sam > file.bam (SAM TO BAM)
UPDATE: I tried with -h
flag but it didn't help!
-h
isn't necessary for SAM-to-BAM. But I thought 'eof marker is absent' is only a bug when reading from STDIN. If so, something probably went wrong during one of the previous steps. Can you read the header from the unsorted bam (samtools view -H
)?Yes. I can read the header from both sorted and unsorted bam files. Somehow only the indexing of the bam file is not working.
I put your header and the one read line you posted into a file and sorted/index it - as expected, no problems there...
Just to be thorough, you posted:
which produces "file.bam.sorted.bam" but you tried to use the index of "file.sorted.bam". That's probably just a posting issue...
That could be just a typing mistake. I have rerun all the steps again and am still struggling with indexing the sorted bam file though I can read the headers from sam, unsorted bam and sorted bam files. I am surprised that you made it to work in test run with only header and one read line. I will try that as well first.