Hello everyone,
I have a few bam files with relative high coverage. Now I want to compare multiple measures on how well they work with lower coverage samples. To do this, I want to cap/reduce the coverage of my bam files (e.g. to 50, 30 and 10% of the original).
So far I tried to achieve this by using samtools view -s
and the corresponding values (0.5, 0.3, 0.1), but it seems that I get output files with 0 coverage for many regions.
My questions:
1) Am I doing something wrong?
2) Does samtools view -s subsample the reads on a file scope or a position wise scope?
3) Is there a way to cap/downsample reads in a position wise manner?
1) no, 2) I think some kind of reservoir-like sampling, 3) not that I know of in samtools. It would also not be "natural" sequencing behaviour to have even coverage since factors such as GC and mappability bias will do its part to make the coverage intrinsically uneven. Even coverage would require cherrypicking of reads until you get what you want.