Dear all,
I work with paired-end amplicon data. Following the recommendations of the bioinformatics community, I soft-clip primer sequences from BAM files (see my post if you are looking for tools designed to perform this task).
So far, I've tried soft-clipping primer sequences with BamClipper and Katana. Each of the tools has introduced errors into the BAM files. Here are the exemplary outputs of ValidateSamFile (MODE=SUMMARY, IGNORE_WARNINGS=false):
- After BamClipper:
HISTOGRAM java.lang.String Error Type Count
ERROR:INVALID_FLAG_SUPPLEMENTARY_ALIGNMENT 60
ERROR:INVALID_MAPPING_QUALITY 129
ERROR:MISMATCH_FLAG_MATE_UNMAPPED 78
ERROR:MISMATCH_MATE_ALIGNMENT_START 4934
ERROR:MISMATCH_MATE_CIGAR_STRING 1427718
WARNING:MISSING_TAG_NM 1428978
- After Katana:
HISTOGRAM java.lang.String Error Type Count
ERROR:INVALID_UNALIGNED_MATE_START 54543
ERROR:MISMATCH_MATE_CIGAR_STRING 2146704
Is it expected to get such errors after soft-clipping sequences? Why did they occur? And, more practically, how to fix them?