Can anyone suggest how to use samtools to filter only hits where there is a insertion in the reference that splits the sequence hit roughly by the middle?
My sequences are in the range of 100-1000bp and were aligned using "bwa bwasw -z 100".
I don't expect perfect hits, so mismatches and small indels can occur at both ends of the hit, but I am looking for insertions in the reference in the middle that are 10x+ bigger than any small indels at both ends.
I don't have paired ends.
So you are assuming that only one read aligns around a given indel? If so, I think you might need to write some code. If, on the other hand, your depth is more than that, it might be useful to call the indels using something like Dindel rather than relying on ad hoc processing.
What kind of coverage are you talking about here? What data processing do you use to produce the alignments?
@Sean Davis: added comment - My sequences are in the range of 100-1000bp and were aligned using "bwa bwasw -z 100".