For a project I'm working on, I need to figure out how to model noise that may be happening in real genomes due to alignment errors, contamination, etc. Specifically, short-read paired-end data either the fastqs or bams. The reason I want to do so is to distinguish this noise from signals of actual subclonal variants.
When I say noise, I mean things like discordant pairs with abnormal orientation happening spuriously across the genome. This noise might be happening due to alignment error or something else, I'm not entirely sure yet.
I've simulated data with SVEngine but the amount of noise (discordant pairs with 1 kb+ insert sizes) that don't overlap with simulated variants is minimal. A total of 11 pairs.
Is there a tool or method I can use to add this noise, specifically discordant pair noise? I don't want to add subclonal variants with a low VAF as a replacement for noise, rather add something truly spurious.
Thanks in advance!