Strictly speaking, I would say the answer is no since you can't possibly know what is in the middle, the region that has not been sequenced.
You can of course reconstruct a likely fragment, by taking the reference sequence from the end of the leftmost read to the start of the rightmost pair and assume that to be the missing region.
REFERENCE -------------------------------------
READ1 and 2 ----> <----
^^^^^^^^^
FRAGMENT |------------------|
I would use those instead of PySam, but if you wanted to use PySam then you can use various strategies to figure out the overlap and then concatenate the reads with that information, though in the end, you would be reimplementing some read, merging yourself, so why bother?
the main challenge in using PySam is that there is already an alignment, and parsing that out correctly (depending on the situation) can be complicated.
REFERENCE -------------------------------------
READ1 --------------->
READ2 <-----------------
FRAGMENT 1--------2------3----------
Pysam will give you the coordinates indicated at the symbols 1 and 2 from the CIGAR string you can compute the coordinate at 3 from those pieces you can compute how much to "chop" off from the second read upon concatenating. Some caveats may still apply, though,
it is best to recreate the fragments from the reads rather than after the alignment
sorry I wasn't clear. obviously you're right but I'm interested only in overlapping reads so that shouldn't be a problem
in that case, the solution is simpler, fuse the reads, and you have your fragment
there are many read-merging programs out there
https://ccb.jhu.edu/software/FLASH/
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2579-2
I would use those instead of PySam, but if you wanted to use PySam then you can use various strategies to figure out the overlap and then concatenate the reads with that information, though in the end, you would be reimplementing some read, merging yourself, so why bother?
the main challenge in using PySam is that there is already an alignment, and parsing that out correctly (depending on the situation) can be complicated.
Pysam will give you the coordinates indicated at the symbols
1
and2
from the CIGAR string you can compute the coordinate at3
from those pieces you can compute how much to "chop" off from the second read upon concatenating. Some caveats may still apply, though,it is best to recreate the fragments from the reads rather than after the alignment