Hello all :)
Is there anyway to check if a BAM file is complete and obeys all the BAM standards?
I have created 100 processed BAMs from 100 unprocessed BAMs, and before I delete the unprocessed data, I'd like to make sure my processed data is OK :)
Of course, there are some things it would be impossible to know was wrong - like I might have mapped all these BAMs to the wrong genome, etc. I'm not looking for those sorts of errors. I'm mainly looking for truncated BAMs, or BAMs where the header doesn't match the read data. It might also be nice to know the reads are sorted properly, but it's not a huge concern.
Alternatively, taking a totally different approach, something like a diff of the BAMs (input/output) to tell me what changed during processing. I can then see if that matches up with what I'd expect.
All the best :)
Oh thats perfect, since I'm already using Picard for everything else :) Thanks Pierre!