Hello all,
This is basic enough question, but I can't seem to find the answer online.
I have a BAM from RNA-seq alignment which is very unevenly covered. However the mean coverage is huge (~100,000x - it's a mitochondrion). I want to use it for polishing, so I would like to retain about 100x "sliding window" (or simply at any given nucleotide) coverage across the whole chromosome, and lose the rest.
How can I achieve this?
Thank you in advance, as always
UPD: Apparently, there's a discussion here that directly answers my question.
Nice, my current version of
samtools
(1.7) doesn't even have that yet! Thank you very much.All right, I've read about the option and still can't figure out how to use the "filter expressions" for coverage-based filtering. I might be missing something obvious...
my bad. I meant option
-s
not option-e
. (fixed)I see. I knew about
-s
but this is not what I'm after - I don't want to proportionally downsample the reads, I only want to remove reads in the areas where coverage is > 100, for example.ok, so you can try my tool: http://lindenb.github.io/jvarkit/Biostar154220.html
see also Capping coverage in bam file
Brilliant, I didn't find that thread somehow (it's tricky to select the right keywords). Thank you!
Brian's explanation about normalization via k-mers also makes a lot of sense - it has long puzzled me what Trinity was doing for example.