Entering edit mode
3.7 years ago
lechu
▴
20
I have a BED file with genomic regions. Is there a tool that, using this BED file, would help me to subset this BAM file so that all reads (and parts of reads) that fall outside of the regions in the BED file would be soft-clipped?
This truly gives impression I'm a a lazy tard that can't use google ;). Thanks!
@Pierre has answers/code written for questions that have not even been asked. These were examples where he already provided answers. So don't feel bad :-)
I looked at the PcrClipReads tool developed by @Pierre. It's great, but as far as I understand, it does not handle cases of reads that overlap two (or more) regions. I realized I was not precise in my question. In a situation like the one depicted below I would only like to clip bases marked x (read 1 should not be clipped). The goal is to use exonic regions and remove everything the protrudes outside of exonic ranges (this may require "internal' clips in the reads). Equally good alternative would be to somehow reset the qualities of the bases to be clipped to zero (instead of clipping). I tried to do it using sam2tsv (also from @Pierre), but ended up with file sizes that were impossible to handle (I need it for whole transcriptome data). Then I run out of ideas.
update 2021: there is now
samtools ampliconclip
samtools-ampliconclip http://www.htslib.org/doc/samtools-ampliconclip.html