Hi following is the break-dancer output example of a deletion using a normal and treated condition for a genome.
Chr1 Pos1 Orientation1 Chr2 Pos2 Orientation2 Type Size Score num_Reads num_Reads_lib addRG_HN002.bam addRG_nanomax_pe.bam
Gm01 2613378 16+0- Gm01 2615298 0+15- DEL 1842 98 5 addRG_nanomax_pe.bam|5 0.07 0.09
If I want to know what is the size of deletion (and what is the start and end of that deletion) then how can I get or interpolate it from the dataset. If I subtract position 2 from 1 it does not add up to the size (Like in this case the size is 1842 but subtracting pos2 from pos1 gives 1920.
The predicted size is of the deletion of 1842. Because BreakDancer only examines read pairs spanning event, the coordinates are the outermost boundaries of the variant. (The span should always be larger than the size of any deletion and can be smaller than the size of an insertion). If you want exact boundaries you should try an assembly-based method like TigraSV or something that uses softclipped reads like SquareDancer or CREST.
Useful links:
http://genome.cshlp.org/content/24/2/310 http://gmt.genome.wustl.edu/tigra-sv/0.1/index.html
https://github.com/genome/gms-core/blob/master/lib/perl/Genome/Model/Tools/Sv/SquareDancer.pl
http://www.stjuderesearch.org/site/lab/zhang
https://github.com/genome/gms-core/blob/master/lib/perl/Genome/Model/Tools/Sv/SquareDancer.pl
Since the above line of deletion is only for the treated condition. So Should I add 1842 or subtract 1842 from the respective position to get the putative region of deletion. That's my question.
thanks
PS: Most of these tools are not published apart from CREST so I am wary of using them.
Tigra was published and Ken Chen's group is actively developing it. See: http://bioinformatics.mdanderson.org/main/TIGRA
For small deletions and insertions you should consider Pindel as well.
The position of the deletion can't be exactly determined using the BreakDancer method... that's why you need to perform assembly or run another tool that detects split reads / softclipped reads. You can usually find some false positive SNPs at the breakpoint boundaries as well.