I am using a probe based sequencing method similar to illumina's trueseq or molecular inversion probe protocols, where I am essentially amplifying only specific 100bp regions of the human genome and sequencing these. So if everything was working ideally I would expect sequencing reads to fall only within my probed regions. However, I do get a substantial number of reads outside the target area, that robustly align to other specific regions of the genome and I don't understand why this would be the case.
I will see some 200 extra variants that fall outside my targeted region, and these will be covered by thousands of reads. And I'm wondering if there is something bioinformatically I should do differently other than just eliminating these reads (I am using bwa mem for alignment and freebayes for variant calling) or if there is a biological reason why a lot of amplicons will align elsewhere in the human genome.