A BAM file contains a description of reads aligned against a reference sequence, plus other information. When converting to fasta/fastq formats these two tools
samtools fasta in.bam out.fasta
picard SamToFastq I=in.bam F=out.fastq
drop all of the alignment information. Is there another tool which can do this conversion and produce a multiple sequence alignment in fasta format directly? It is acceptable, indeed preferable, if due to the presence of inserts it ends up looking something like this:
ACGTT-ACGTTGCA
ACGT--ACGTTGCA
ACGT--ACGTTGGA
ACGT--ACGTTGCA
ACGT--ACGTTGCA
ACGTAAACGTTGCA
ACGT--ACGTTGCA reference sequence
as opposed to (same alignment, all insertions dropped)
ACGTACGTTGCA
ACGTACGTTGCA
ACGTACGTTGGA
ACGTACGTTGCA
ACGTACGTTGCA
ACGTACGTTGCA
ACGTACGTTGCA reference sequence
Yes, one could realign the fasta file against the reference sequence, but since it would not be with the same alignment tool as was used to build the BAM file, the two representations would in most cases not end up with exactly the same alignment.
what's your goal; why would you want this format ?
Among other things, I prefer other alignment viewers to IGV or tablet, and these will accept the aligned fasta format but not BAM.
but, say for a human genome, all the lines would have the size of the chr1 (250 E6 bp ) + managing the insertions ??
The alignments in question here are only up to tens of thousands of base pairs, and some of them have only tens of reads.
But what about other regions having hundreds or thousands of reads? You are likely to crash multi-alignment viewers or at least make them very slow. Anyway, what you are asking is quite tricky to implement and barely useful to others. You should try to find other solutions.
Are you just trying to visualize the alignments? You could just open the bam with IGV to get a nice visualization of the read alignment. Integrative Genomics Viewer (IGV)
See if the answer in this thread helps.