I am trying to simulate sample contamination for different level of dilution for NGS samples. Suppose I have two bam files for SampleA
and SampleB
. I want to generate 5 contaminated samples at dilution of 10%, 20%, 30%,40% and 50% of those two samples. I understand that I should extract reads from one of the two bam files at the given dilution percentage and reassign to the other bam file, but I don't know exactly how to do this. Can someone please explain me the procedure?
Thanks
Thanks Devon. Could you please explain a bit more on point 2 (Shuffle the order of both files, since typically the read generators generate reads in sorted order). What would be the process of selecting the reads based on the chromosome position (or do I even need to consider the chromosome positions?)? Say I have read from chr2:220333-chr2:24444432 of
SampleA
and want to shuffle inSampleB
, how can I do this in a right way?You don't need to perform any selection. If you just want to look at a specific region then restrict the reads generated to only arise from that region (if nothing else, make a fasta file from only that region and generate reads from it).