Extract reads within given region, and their mates
1
3
Entering edit mode
9.4 years ago
user230613 ▴ 380

Hi there, I want to extract all the reads located inside a given location. I would like to extract also the mates of those reads, because I'm working with PE data. I know that I can extract the reads using samtools and the region of interest, but how can I take off the mates too? This is my command:

samtools view -f 2 file.bam chr:start-end

I'm using flag 2 to consider only properly paired reads. I want to evaluate the strand the paired end reads to know if both are forward, reverse..

What I've done is to extract the mates falling inside the region, take the names of those reads, and grep those names in the original bam file to extract both reads. But the bam file is too big and it's terribly slow.

Thanks in advance,

bam samtools • 17k views
ADD COMMENT
6
Entering edit mode

I still find it deeply worrying how often we see people not using -F 4. There must be one event a week on Biostars alone where people don't use -F 4.

You NEED -F 4 pretty much every time you want to filter on mapped reads, otherwise you will get reads that are in a proper pair, mapped to a chromosome with a sensible fragment length but are still NOT MAPPED! And it's not trivial; properly-paired-but-unmapped reads can have TLENs of a billion basepairs, which will totally screw your averages for insert length, etc, which are often used to normalise things - but those sort of bugs go totally and utterly unnoticed.

ADD REPLY
3
Entering edit mode

For those who want to explore this very important point more. This utility is very helpful for thinking about SAM flags and their meaning. https://broadinstitute.github.io/picard/explain-flags.html.

Remember -f means give me (require) everything that matches the criteria of my flag (e.g. -f 2, get 'read mapped in proper pair') and -F means remove everything (filter) that matches the criteria of my flag (e.g. -F 4, remove 'read unmapped').

ADD REPLY
1
Entering edit mode

Yes! I use this tool all the time. It is so helpful! Even if I think I have my flags right I always double check with this tool.

ADD REPLY
1
Entering edit mode

How is this possible?

[reads] mapped to a chromosome with a sensible fragment length [which] are NOT MAPPED ?

I have taken a look at one of my bam files, extracted a region and sought for unmapped reads in the resulting sam output. I actually found a single read that is registered as belonging to the extracted region but it is also flagged as unmapped.

samtools view -F4 sample.bam chr4:0-10000 | samtools view -f 4

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks Pierre. Is there a way to do the opposite of that ?

So instead of :

java -jar ~/software/jvarkit/jvarkit.jar samviewwithmate -b bedfile sample.bam > only_targets_and_mates_elsewhere.bam

Do the inverse

 java -jar ~/software/jvarkit/jvarkit.jar samviewwithmate -b bedfile sample.bam (-v = INVERSE?)
> only_NOT_targets_and_mates_elsewhere.bam
ADD REPLY
0
Entering edit mode

Are you grep-ing those names one at a time or are you using grep -Ff

ADD REPLY
0
Entering edit mode

Second one, grep -Ff.

ADD REPLY
1
Entering edit mode
5.8 years ago
cmdcolin ★ 4.0k

The tool "Bazam" is built for this purpose but it outputs the reads in fastq https://github.com/ssadedin/bazam

Ref https://www.biorxiv.org/content/10.1101/433003v2

ADD COMMENT

Login before adding your answer.

Traffic: 1312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6