As part of their submission requirements ENA/ArrayExpress require BAMs to include all reads (e.g. also unmapped ones). TopHat in its wisdom splits unmapped reads from the accepted_hits.bam output.
However, creating a 'complete' BAM is not so straightforward due to problems in the unmapped reads BAM (detailed on seqanswers). I've tried the script mention on SeqAnswers, but it doesn't solve all the problems for me. I still get unmapped reads which don't have a mate in accepted_hits.bam
Given TopHat's popularity this should be a solved problem, right? So, how does everyone deposit their TopHat alignments for publication? Or are fastqs the preferred format.
Any suggestions?