Hi all
So I have used BWA to align fastq files (mate pair, Illumina HiSeq) to a reference sequence. I then extracted the unmapped reads using samtools which gives me a bam file.
I would like to take this bam file and map it back to my reference sequence using BWA again. However bam files don't seem compatible - I do get an output sam file but the flagstat and other downstream tests then do not work. I have tried converting the bam to fastq and to fasta, same story. Though I get an output sam file from BWA alignment, I can either not visualize the sam file or downstream tests yield no results?
Is there something I'm missing here? Is it the paired end reads that are messing things up? Has anyone done something like this before?
Thanks!
If you use bwa, you should run bamshuf first.
Oh good point lh3!
I remember reading somewhere than in some tools you have to shuffle the reads but also sort them by name so the pairs are together in the BAM file (but pairs are still randomly distributed). Im not sure if that's still required by bedtools's bamtofastq or other tools these days. Do you know anything about that? It's been a while since I did a bam -> fastq.
Bamshuf groups reads by names. Bwa reads a batch of reads and infers insert size distribution from the batch. When input reads are coordinate sorted, the inference is inaccurate and sometimes may fail. For bwa, it is important to use bamshuf. Those bam2fastq won't work well.
As a quick caveat, bamshuf has been renamed to collate in samtools. This is what I needed as well! Thanks!