I'm tying to extract a specific region of a bam-file into a fasta-file (ultimately). All of the methods I've tried so far give me all reads that OVERLAP the desired region, I'm trying to find a way to trim those to only the desired region.
I've tried:
samtools view
samtools view compiled.sorted.bam ConB:2185-2195
intersectBed
intersectBed -b test.bed -abam compiled.sorted.bam -ubam > out.bam
but these will give the entire read that overlaps my desired region, I'm trying to get something that will trim everything to sam/bam file where the 'reads' are 10 nucleotides long. Am I just missing a flag somewhere to limit the returned region?
So if read spans the boundaries, you want to retrieve just that part of the read that is inside that region?
Correct. I'd prefer to exclude things that are only partially inside the region ... although I can parse that out in my downstream analysis.
There's nothing in samtools, at least, to trim reads at a given boundary, since that's not exactly a common need. I suspect you'll need to code this up yourself (I can foresee some annoyances there).
Yeah, that's what I'm seeing. I figured it would be a more common request, but I guess not ... python to the rescue!