Recently I am trying to use SVDetect to characterize the somatic Structural variations on my cancer whole exome data. Many small inverted duplication(INV_DUPLI) were detected. About 25,000 per samples. This number is not what I expected,because
1) some paper reported less than 10 inverted duplication per tumor genome. Therefore, I would expect a much smaller number. 2) Most if not all of them have the similar configuration, which is the same sequence read from paired end read in the same strand(highlighted as bold in the following example)
**chr10 100249595 100249926 chr10 100249595 100249926** 5 (HWI-ST318_241:1:1106:5579:4835,HWI-ST318_241:1:2108:20241:159703,HWI-ST318_241:1:1107:7278:144293,HWI-ST318_241:1:2207:15310:64471,HWI-ST318_241:1:1202:12448:19905) **(F,F,F,F,F) (F,F,F,F,F)** (1,2,1,2,1) (2,1,2,1,2) (1,2,3,4,5) (1,2,3,4,5) (100249596,100249625,100249748,100249824,100249852) (100249596,100249625,100249748,100249824,100249852) INV_DUPLI 0 5/6 UNBAL 5/5 (100249926,100250064) (100249926,100250064) 0.833333333333333 6 131665.1
I wonder whether anyone who have dealt with SVDetect and has experience on this? Should we just discard all of them? OR Should we keep few of them according to certain criteria? In addition, I can not imagine what may lead to this type of sequencing reads from the paired end sequencing. Any idea on this?
Thank you very much.
sent a message to the authors via the Messaging system