Apologies if this has already been asked; I've looked around and haven't been able to get a clear, definitive answer to this question.
I have a number of interleaved paired-end FASTQ files that are sorted. I read in a (somewhat dated) GATK tutorial that the reads in these files should be sorted randomly (keeping the pairs together, one would assume) prior to alignment with BWA-mem, lest a bias be introduced. Is this correct?
Assuming it is, is there a tool that you know of / can recommend to sort interleaved paired-end FASTQs in such a way? I know there are scripts that can be written, but I'd prefer to stand on the shoulders of a (verified, preferably open-source) giant.
Many thanks!
it's true if the fastq have been generated from a previously-ordered bam: the order of the reads is non-random and so there is a bias in the estimation of the average fragment length.
That is the case with the files in question; they have been reverted from aligned BAMs to FASTQs. Is there a software you could recommend to shuffle them by any chance? Thanks!