I've been trying to detect SV signals in each sample by going through the reads at each potential SV location, but I've come across an issue when trying to identify breakpoints for inversions. I was wondering if anyone could tell me how one can use the information for split reads in a SAM file to identify breakpoints, particularly when the two splits map to opposite strands. Put another way, what information is available to take a read that has been split to determine how the two splits could be reassembled (or the orientation of that reassembly)?
For example, let's say the unaligned read looks like this:
AAAATTTT
and is split and mapped like this:
AAAA TTTT
----------------------------->
what information is available to determine if the original read was AAAATTTT or TTTTAAAA?
I was under the impression that the leading soft-clipped bases for splits held this information, but this seems to not be the case for splits that are on opposite strands. But, I could be wrong and hoping I'm missing something here.
As a practical case, below are two reads that are split at a potential inversion breakpoint, and I'm trying to figure out if the breakpoint is on the rightmost or leftmost position of the two alignments (excuse the emoji code in the query name):
A00434💯HM75NDMXX:2:1265:1497:27445 147 AaegL5_1 97039982 60 105M45S = 97039883 -204 AAACAAACGCCGATAAGACCCTGATCGACTCGGAACTACAATCTGTTGCGCTTTCTTCACAAACAATGGACCCACAACAGTTGGTGAGGCGCACTGGGAGGGAGCAAGTGCAACACGCTAAGAACTGGAGTCCTCCTAGCTAGTAGGAGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFF SA:Z:AaegL5_1,97041467,+,47M103S,60,0; MC:Z:150M MD:Z:105 RG:Z:HM75NDMXX.2.1101 NM:i:0 AS:i:105 XS:i:0
A00434💯HM75NDMXX:2:1265:1497:27445 2179 AaegL5_1 97041467 60 47M103H = 97039883 -1585 CCTCCTACTAGCTAGGAGGACTCCAGTTCTTAGCGTGTTGCACTTGC FFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF SA:Z:AaegL5_1,97039982,-,105M45S,60,0; MC:Z:150M MD:Z:47 RG:Z:HM75NDMXX.2.1101 NM:i:0 AS:i:47 XS:i:0
Any and all information would be greatly appreciated!
-Will