I want to conduct some computations using a python script directly on some BWA aligned bam files, and to do this I need to remove the soft clipped bases. i.e. if the cigar string and read is: 2S8M CCTGGAGAAT I want to clip so it becomes: 8M TGGAGAAT
I tried to do this using clip reads in GaTK but the hardclip option is throwing errors and is unsupported.
Is there anyway to remove the soft clip bases with another piece of software.
Because my coverage is not very high, I don't want to disable softclipping in BWA, as I will loose a lot of coverage.
Thanks
Can't you just parse the CIGAR string in your script and take care of it there?