Hi, I am trying to use results come from breakdancer and found multiple problems associated with the output vcf (converted using a python script by https://github.com/ALLBio/allbiotc2/blob/master/breakdancer/breakdancer2vcf.py).
- Start and End Coordinates are not always correct (some END has smaller than start)
- SVLEN is not correct as other tools does, meaning "END-STAR=SVTLEN"
- What is definition of SVTYPE=INS by breakdancer? I found start and end has big difference? Which should not be as per my understanding from different tool representation such as Delly. "START and END should have only "1" nucleotide difference. OR some other tool such as WHAMG represents "Insertions start and end at the same position.".
WHAMG : Insertions start and end at the same position
chr1 1442939 . T <ins> . . SVTYPE=INS;END=1442939;
Delly represents start and start+1=END
chr11 58325539 INS00059630 G GCACACACATGTACACA . PASS PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv0.7.8;CHR2=chr11;END=58325540;PE=0;MAPQ=60;CT=NtoN;CIPOS=-10,10;CIEND=-10,10;INSLEN=16
but BREAKDancer output has some different way of representation which I really don't understand.
chr1 24662379 . . . . PASS PROGRAM=breakdancer;SVTYPE=INS;SVLEN=-3686;SVEND=24721723
Original output from breakdancer
chr 1 24662379 59+59- chr 1 24721723 59+59- INS -3686 99 37 /projects/grs-lab/RupeshWork/ResultsMP/180920-110831_JGBF_LeJ_GT17-02246-MP_AGTCAA_S51_L008/mapped_bowtie2/JGBF_LeJ_GT17-02246-MP_AGTCAA_S51_L008_realigned.bam|37 -nan NA
Can anyone have some advice to overcome these problems? As while using SURVIVOR merge its not working due to conflict of END within the breakdancer vcf or should say native output format.