Hi there, I want to extract all the reads located inside a given location. I would like to extract also the mates of those reads, because I'm working with PE data. I know that I can extract the reads using samtools and the region of interest, but how can I take off the mates too? This is my command:
samtools view -f 2 file.bam chr:start-end
I'm using flag 2 to consider only properly paired reads. I want to evaluate the strand the paired end reads to know if both are forward, reverse..
What I've done is to extract the mates falling inside the region, take the names of those reads, and grep those names in the original bam file to extract both reads. But the bam file is too big and it's terribly slow.
Thanks in advance,
I still find it deeply worrying how often we see people not using
-F 4
. There must be one event a week on Biostars alone where people don't use-F 4
.You NEED
-F 4
pretty much every time you want to filter on mapped reads, otherwise you will get reads that are in a proper pair, mapped to a chromosome with a sensible fragment length but are still NOT MAPPED! And it's not trivial; properly-paired-but-unmapped reads can have TLENs of a billion basepairs, which will totally screw your averages for insert length, etc, which are often used to normalise things - but those sort of bugs go totally and utterly unnoticed.For those who want to explore this very important point more. This utility is very helpful for thinking about SAM flags and their meaning. https://broadinstitute.github.io/picard/explain-flags.html.
Remember
-f
means give me (require) everything that matches the criteria of my flag (e.g.-f 2
, get 'read mapped in proper pair') and-F
means remove everything (filter) that matches the criteria of my flag (e.g.-F
4, remove 'read unmapped').Yes! I use this tool all the time. It is so helpful! Even if I think I have my flags right I always double check with this tool.
How is this possible?
I have taken a look at one of my bam files, extracted a region and sought for unmapped reads in the resulting sam output. I actually found a single read that is registered as belonging to the extracted region but it is also flagged as unmapped.
samtools view -F4 sample.bam chr4:0-10000 | samtools view -f 4
Do either of these posts help?
How to extract reads in a specific region from a BAM file and also their mates mapped elsewhere ?
A: Extract Alignment By Read Id From A Sam File
I wrote something today: http://lindenb.github.io/jvarkit/SamViewWithMate.html
Thanks Pierre. Is there a way to do the opposite of that ?
So instead of :
Do the inverse
Are you grep-ing those names one at a time or are you using
grep -Ff
Second one,
grep -Ff
.