Hi everyone, I am new to bioinformatics and I was reading a review article on how to find structure variation using pair-end sequencing. While I understand what it means fine (if there is a deletion the distance between two read pair would decline and so on) I am struggling to write a program to find these "SV indicating" reads. Here's what I have done.
- align my reads to the reference genome using BWA-mem
- Using the following python code to find read pairs with either distance too big/small or hanging pair or alignment quality too small or bwa thinks it's not a proper match.
if (np.abs(read1_pos - read_mate_pos) > read_distance_big_threshold): output_pair_end_fastq(read, read_mate) # print("Distance Too Big") elif read.mate_is_unmapped: output_pair_end_fastq(read, read_mate) # print("Hanging Pair") elif (read.is_reverse == read.mate_is_reverse): output_pair_end_fastq(read, read_mate) # print("Orientation!") elif np.abs(read_mate_pos - read1_pos) < read_distance_small_threshold: output_pair_end_fastq(read, read_mate) # print("Distance Too Small") elif not read.is_proper_pair: output_pair_end_fastq(read, read_mate) # print("Not Proper Pair") elif read.mapping_quality < read_mapping_quality_threshold: output_pair_end_fastq(read, read_mate) # print("Quality Too Low")
The normal distance of my pair end reads is 500bp and the read_distance_big_threshold is 600 while read_distance_small_threshold is 400.
My problem is I am having tons of false positives meaning I have found a lot of "non SV indicating reads" in my way. So my question is "Is there any existing tools that can do this kind of thing for me? "
Specificly I am looking for a tool to find all the reads indicated in this graph
i'm pretty sure that the article cites those tools....
I wouldn't be using 10 year old bioinformatics software unless it's been constantly updated over that period. BreakDancer is a perfect example of such a highly cited unmaintained SV caller. It performs terribly on read length is over 50bp and pretty much every tool is benchmarked against it and, unsurprisingly, everyone outperforms BreakDancer.