Hi,
I have paired-end NGS data of a fruitfly population, and I am trying to detect inversions according to paired-end insert size under the premise that a much larger insert size will be observed in presence of inversions. ( A break point between pair-end reads will increase the insert size when aligned to reference genome)
But I find the situation is more much complicated that I expected. The reads can be mapped in different ways, eg. supplementary alignment or chimeric reads... I also noticed the sam flag(second column in sam files) provides such information, but I am not clear how to filter reads according to these flags.
My question is: in order to infer inversions based on insert size, how should I filter reads?
Thanks in advance!
Exactly, this is not trivial at all and I strongly encourage to use dedicated software for it, such as
lumpy
(or any other structural variant caller). Naive approaches might work but will take time to develop, must be tested etc... Use standard software!