Extract One-End Mapped Paired-End Reads From Bwa Bam/Sam Files.
2
2
Entering edit mode
13.1 years ago
Bioscientist ★ 1.7k

Now need to use Pindel(split-read approach to identify CNV). I'm just wondering how I can extract one-end mapped reads from BWA result? Does samtools have such options, or we need to extract by parsing the sam/bam file according to flags?

Thanks

Edit: Just another question. After BWA running, I actually do the filtering to pick out those mapping with MAPQ > 20; will this affect picking up unmapped/one-end mapped reads?

I guess this won't affect finding one-end mapped reads; but those two-end unmapped reads cannot be picked up.

Edit again: This is my another question: I ran the command:

samtools view -u -f 8 -F 260 map.bam  > oneEndMapped.bam

And let's look at the line of oneEndMapped.bam:

HWI-ST150_0129:2:1:1414:2357#0    153    chr1    154432976    37    101M    =    154432976    0    ACATACATATGTATGTAATATGTGATTATTACTGGTGGTTTGCTAAATAATGACAAATGGAAAAAAATATATGTCATCCAAAGAGAGCCACTGTTAATTTT    BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBdc^ddcacc\TdcdddddYbe_dffddcffcffecedecedaeeeedd\def^ffeff`c    RG:Z:101127_s_2    XT:A:U    NM:i:2    SM:i:37    AM:i:0    X0:i:1    X1:i:0    XM:i:2    XO:i:0    XG:i:0    MD:Z:16G20A63

How to interpret this: To me, this looks like both ends have the same sequences, and map to the same position. (leftmost coordinate of one read; and leftmost of another). Why do we say this represents one read maps while the other one doesn't map? To describe "one read maps while the other one doesn't map", in my mind, it should be like:

HWI-ST150_0129:2:1:1414:2357#0    153    chr1    154432976    37    101M    =    *   *

Also running

samtools view -u  -f 4 -F 264 map.bam  > oneEndUnmapped.bam

This oneEndUnmapped.bam is only 8K, means nothing. I don't quite understand this.

thx

samtools pindel • 17k views
ADD COMMENT
0
Entering edit mode

If all you did was select SAM mapping entries with score of > 20 and left the entire line intact, there shouldn't be any problem using the samtool view commands I posted.

ADD REPLY
0
Entering edit mode

Actually this inspires me to think about what MAPQ stands for. How it's calculated. I mean if one-end maps correctly with a probability p; while each of the both-mapping ends maps correctly with the same probability. Then will the MAPQ be the same? Or the both-mapping MAPQ twice the one-end?

ADD REPLY
0
Entering edit mode

Also I look at the samtools/view man page, and seems -f is required flag; while -F is filtering flag.And there's a page for explaining SAM flags:http://picard.sourceforge.net/explain-flags.html.....Can anyone kindly explain a little about required vs filtering flag? thx

ADD REPLY
0
Entering edit mode

Hi I edited my response to explain the -f and -F.

ADD REPLY
6
Entering edit mode
13.1 years ago

Samtools allows you to output entries with certain flag with the -f and -F options. SamTools manual: http://samtools.sourceforge.net/samtools.shtml

So for all pair-end reads where one end is mapped and the other end isn't mapped, this should output all the mapped end entries:

samtools view -u -f 8 -F 260 map.bam  > oneEndMapped.bam

and this will output all the unmapped end entries:

samtools view -u  -f 4 -F 264 map.bam  > oneEndUnmapped.bam

edit*

Use this tool to figure out what the flags mean: http://picard.sourceforge.net/explain-flags.html

In the first command to get one end mapped, the -f option will make Samtools take any entries that has 8, meaning take any entries that has mate unmapped. The -F option will make Samtools skip any alignments that has 260, meaning skip any where the read itself is unmapped and where it is not a primary alignment.

In the second command, -f of 4 will take any read that is unmapped. -F of 264 will skip any entry where the mate is unmapped (making both unmapped) and where it is not a primary alignment.

ADD COMMENT
1
Entering edit mode

samtools view -u -f 12 -F 256 map.bam > bothEndsUnmapped.bam

ADD REPLY
0
Entering edit mode

Why do we need -F 264 when we have specified -f 4? -f 4 will only filter-in unmapped reads so there won't be any primary alignments left, right?

ADD REPLY
0
Entering edit mode

Hi Damian Kao, oneEndUnmapped.bam appears to be truncated after running above commands. Is there a way to extract reads from this truncated BAM file?

ADD REPLY
0
Entering edit mode

Hello, I wanted to ask if sb can confirm me:

(sorry if this not the appropriate place ,but it's the discussion exactly to what I need to know! )

In case we want to keep the paired reads that are both mapped, is it enough to run:

samtools view -b -F 12 reads.bam > pairedMapped.bam

?

Thanks in advance

ADD REPLY
2
Entering edit mode
13.1 years ago
Rlong ▴ 340

Pindel now supports reading bams directly. Pindel looks for pairs where one mate has mapped and the other remains unmapped. Since pindel is reading these in directly from the bam, you should not need to worry about dumping them into some sort of intermediary file.

ADD COMMENT
0
Entering edit mode

oh, i see...thx

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6