Hello,
We've got some BAM files generated by Complete Genomics, and when we try to validate them with samtools, (a) they take an incredibly long time to validate (approx 100 hours), and (b) we get billions of validation errors indicating that a paired end read is missing its mate pair. I'm trying to figure out if something is wrong with these BAMs, or if for some reason samtools and Complete Genomics BAMs are just a bad combination. Anyone have experience with them?
Thanks!
What exactly you mean by 'validate' using samtools ? Normally, any aligner will output the unmapped pair of a mate-pair reads to the BAM file. Its reasonable to discard reads belonging to a pair if none of them is mapped. But in the case where one end is mapped and the other remains unmapped, the sequence information of the unmapped read could be used to detect indels using split read method. You would still be able to call for SNPs and Indels using methods other than split read method.