Hello,
ref --------------start-----------------------------------------stop-------------------------------
r1 -------------------------------------------------------
r2 ---------------------------------------------------------------
r3 ----------------- --------------------------------------
r4 -----------------------------------------------------------------------------
a. I would like to extract reads from bam file that overlap entirely the start and stop region from both directions (from start to stop and from stop to start). From the example above, I need to keep only r1,r2 and r4, but not r3.
How to do this?
I tried this, but I still have some small fragmented reads...
samtools view -b -h -q 10 input.bam chrX:230-330 | awk 'BEGIN{OFS="\t"}{if($1 ~ /^"@"/) {print} else {if($4 >= 230) {print} else {}}}' | samtools view -Sbo output.bam
I also tried:
samtools view -h -q 10 input.bam chrX:230-330 | awk 'BEGIN{OFS="\t"}{if($1 ~ /^"@"/) {print} else {if($4 >= 230 && length($10) >= 100) {print} else {}}}' | samtools view -Sbo output.bam -
If the regions are small you could combine the standard region querying with the filter expression, ie
I haven't tested it. Maybe it needs to be pos+rlen-1, but you get the jist. I hope it works! If not then maybe it'll need a samtools view | samtools view style command to do it in two passes.
Your diagram and text do not match - r1 is the only read that spans both the start and stop. Is that what you want?
I updated my question, thanks