Entering edit mode
12 months ago
dariober
15k
I have nanopore reads for Leishmania major aligned with minimap2. I observe an odd strand bias in the supplementary alignments in that segments aligning on forward strand are more represented than reverse aligning segments (7601 vs 6396 or 54% vs 46%). For the main alignments, there is no evidence at all for strand bias (63666 vs 63946). As a table:
63666 Main reverse (samtools view -q 10 -F 2048 -f 16 -c)
63946 Main forward (samtools view -q 10 -F 2048 -F 16 -c)
6396 Supplementary reverse (samtools view -q 10 -f 2048 -f 16 -c)
7601 Supplementary forward (samtools view -q 10 -f 2048 -F 16 -c)
Does anybody have an explanation for how this bias in the supplementary alignments could originate?
could be worth plotting where the reads are located on the genome, perhaps the coverage of the supplementary alignments, by strand. if there is some 'pileup of reads' then it could indicate a CNV or something that. the supplementary alignments are also what i like to call 'split alignments' so a long read could align through the same CNV region multiple times as multiple supplementary alignments
Thanks- Yes, there is a telomeric region with 1336 supplementary alignments in forward and only 734 in reverse. So a massive bias. These 1336 and 734 alignments come from 100 and 71 distinct reads, respectively. So the same read is often wrapped on itself several times. However, the most represented read has "only" 48 supplementary alignments. So it's not just a handful of very long reads causing the bias by chance. I still don't see why there should be more alignments on forward as if forward aligning reads have better chances of being sequenced.
yes I agree there is still a mystery that could be uncovered, but it might just be that one would not strictly expect there to be equivalent values. the fastq may be more or less randomly distributed, but once it goes through the aligner machinery, it could be that weird effects happen (i am not sure!)
my 'cnv split alignment' hypothesis above would only explain the case that, just by chance, a forward alignment could be split aligned a bunch of times, producing a bunch of forward reads but a reverse read in the same area could or even should do the same thing. not sure, but unmapped reads, or looking at large amounts of softclipping on reverse strand reads that don't have associated split alignments, could potentially reveal the 'missing reverse strand' data