Hello,
I would like to identify genomic structural variants, specifically small genomic inversions in bacterial genome of a relatively small size (2 Mb), by using paired-end Illumina sequencing data.
This paper did something interesting but is impossible to replicate because they use custom MATLAB scripts and they're not available to the public... But the principle is interesting. Basically, they claim that plotting the gap size (the calculated genomic distance between the pairs of reads) against their genomic location will produce a distinct pattern. Normally, the vast majority of reads will have a similar gap size (producing what they call a ribbon), but if there is a genomic rearrangement, there will be a deviation from that pattern (what they call a funnel). See image below.
My question is: how to extract from a SAM file the gap size and the genomic position of the reads so I can try and plot them?
Thank you in advance,
TP
I would suggest to just email the authors of the paper and ask for the script.
Haven't heard from them yet. However, I think this would be pretty easy to do for someone who knows how to code (I'm thinking python?). If I have time I might try it, but I'm no bioinformatician, so...