I map raw pair end raw read to cDNA and now want to extract only pair end reads which are mapped to the cDNA.
I am using this samtool command, but it also reads whose one is mapped but other not:
samtools view -b -F4 in.bam > out_mapped.bam
I map raw pair end raw read to cDNA and now want to extract only pair end reads which are mapped to the cDNA.
I am using this samtool command, but it also reads whose one is mapped but other not:
samtools view -b -F4 in.bam > out_mapped.bam
The samtools flags
command prints the summary of the flags that you can use to filter your data, and the SAM format specification explains them in details.
$ samtools flags
About: Convert between textual and numeric flag representation
Usage: samtools flags INT|STR[,...]
Flags:
0x1 PAIRED .. paired-end (or multiple-segment) sequencing technology
0x2 PROPER_PAIR .. each segment properly aligned according to the aligner
0x4 UNMAP .. segment unmapped
0x8 MUNMAP .. next segment in the template unmapped
0x10 REVERSE .. SEQ is reverse complemented
0x20 MREVERSE .. SEQ of the next segment in the template is reversed
0x40 READ1 .. the first segment in the template
0x80 READ2 .. the last segment in the template
0x100 SECONDARY .. secondary alignment
0x200 QCFAIL .. not passing quality controls
0x400 DUP .. PCR or optical duplicate
0x800 SUPPLEMENTARY .. supplementary alignment
In your case, you need to filter in the PAIRED
reads, and filter out both the unmapped (UNMAP
) and the mapped whose mate is not mapped (MUNMAP
). Perhaps you will be interested in properly paired reads as well (PROPER_PAIR
), but this is more stringent than what you asked in your question.
Altogether, this makes either -f 0x1 -F 0xC
, or -f 0x2
. Have a look at the result of the command samtools flags 0xC
if you are puzzled about 0xC
.
In order to get a good combination of flags, I usually use http://broadinstitute.github.io/picard/explain-flags.html. You tick what you want to select for (or filter for) and use the number for the -f / -F parameter.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Depending what aligner you used you may have lost the R1/R2 read designationin the fastq header (bbmap preserves that when mapping). So keep that in mind as you extract the reads.