Entering edit mode
9 months ago
William
★
5.3k
What can cause broken read-pairs (chimeric read pairs) in a sequencing run?
We are finding an unexpected high % of chimeric read pairs. Based on:
- lower than expected % of proper read pairs
- higher than expected % of unique mapped read mapping to different chromosomes within the same read pair
- color coding of reads in IGV on insert size
Difference v.s. expected % values for proper pars and mapping to different chromosomes is just a few % . But this is enough to cause a significant increase in (SV) analysis time.
This is unrelated to barcode hopping I think. Because:
- There is only 1 barcode per read pair?
- Multiple species were sequenced in same sequencing run, if broken read pair reads came from different species, they would be unmapped.
See also this IGV screenshot. Colored reads indicating unique mapped reads with unexpected insert size to it's pair. This is not a local pattern, but genome wide.
Reads are not out-of-sync in the input R1/R2 files, correct?
Have you done something to bin/separate the reads before alignment that could have potentially caused above to happen?
Will double check the FASTQ files and if the IDs of reads displayed as mates in IGV make sense. No custom process has been done to the FASTQ files.
What is the relationship between reference genome and samples? I would expect something like this in the case of a species with high levels of mobile genome elements and a reasonably large evolutionary time between sample and reference genome populations.
Alternatively, the reference genome collapsed or expanded lots of repetitive regions. Have you tried mapping to a repeat masked reference?
Sample and reference are closely related. Also also indicated by the 1 obvious SNP in a multiple KB region.