I have an alignment of an assembled contig with the reference genome. The contigs are very long (PacBio + HiC assembly), and one "query" will generate multiple alignments with the "reference", as a consequence of structural variation, which breaks up the alignments.
I want to figure out if those alignments are in the right order relative to the contig they originate from. Next step would be to check if any of these alignments have deletions - leading to gaps, or change direction (due to inversions). I can't find any pysam attributes which seem do to the trick, right now, and I'm not sure if it's encoded in some obscure tag of the bam?
Suggestions?
Thanks a lot, that sounds very useful. I haven't looked at PAF alignments yet, only in bam format. What I ended up doing was to write a Python script and parse the CIGAR strings to get the order of the supplementary alignments. A preliminary script can be found here.
I'll definitely take a look at your approach the next time I need to do a similar task.
Looks cool! I just had a quick look at the code and it seems that it is a better approach for high throughput processing of the nanopore complex reads. Thanks a lot for sharing!
Comparing the results with your method would definitely be the next step, as I'm not 100% confident I did not mess up somewhere!