How To Filter Mapped Reads With Samtools
6
63
Entering edit mode
12.1 years ago
sohadb1357 ▴ 640

Hi

How do I filter a bam file with some tools (Specifically -how I can remain with the unmapped reads only?). I have single-end mapping,

I searched for hours but everywhere I see the suggestion of samtools view -u -f 4 that (as I understood) doing the oposite thing - filtering out the unmapped reads.

Thanks!' Ohad

samtools • 298k views
ADD COMMENT
0
Entering edit mode

hello can anybody tell me What different between these two command-line?

1- samtools view -bh -f 4 -F 264 sample.bam > mapped.bam

2-samtools view -bh -f 8 -F 260 sample.bam > mapped.bam

thanks Mahdi

ADD REPLY
172
Entering edit mode
12.1 years ago

Hi, You get a bam (machine readable sam) file after mapping, and it contains information about mapped and unmapped reads.

To get the unmapped reads from a bam file use:

samtools view -f 4 file.bam > unmapped.sam

the output will be in sam

to get the output in bam, use:

samtools view -b -f 4 file.bam > unmapped.bam

To get only the mapped reads use the parameter F, which works like -v of grep and skips the alignments for a specific flag.

samtools view -b -F 4 file.bam > mapped.bam

From the manual; there are different int codes you can use with the parameter f, based on what you want:

-f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]

Each bit in the FLAG field is defined as:

Flag        Chr     Description
0x0001      p       the read is paired in sequencing
0x0002      P       the read is mapped in a proper pair
0x0004      u       the query sequence itself is unmapped
0x0008      U       the mate is unmapped
0x0010      r       strand of the query (1 for reverse)
0x0020      R       strand of the mate
0x0040      1       the read is the first read in a pair
0x0080      2       the read is the second read in a pair
0x0100      s       the alignment is not primary
0x0200      f       the read fails platform/vendor quality checks
0x0400      d       the read is either a PCR or an optical duplicate
  

Like for getting the unique reads (a single read mapping at one best position); I use:

-q INT Skip alignments with MAPQ smaller than INT [0]

samtools view -bq 1 file.bam > unique.bam

HTH

ADD COMMENT
6
Entering edit mode
samtools view -b -F 4 file.bam > mapped.bam

Does it really get all mapped reads because using the above gives me less reads than:

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam
samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam
samtools view -b -F12 file.bam > bothEndsMapped.bam
samtools merge merged.bam onlyThisEndMapped.bam onlyThatEndMapped.bam bothEndsMapped.bam
ADD REPLY
6
Entering edit mode

From my understanding your first command extracts all mapped reads, but does not extract mates that were unmapped. The command in your second set:

samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam

Extracts UNMAPPED reads who's mate is mapped. Thus, you will have more reads with the second set of commands because you are including unmapped mates of mapped reads.

ADD REPLY
1
Entering edit mode

To make sure I have this correct...

The -f options flags to keep the reads 'Only output alignments with all bits set in INT present in the FLAG field' The -F option flags to remove the reads 'Only output alignments with all bits set in INT present in the FLAG field'

using this page I explored the int meanings but I still a bit confused.

In your first command:

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam

The reads that are unmapped are removed (-F 4) and the mates that are unmapped are kept (-f 8). Does this mean it extracts the mapped reads and mates that are unmapped (I assume mate means the other read for paired-end reads)?

My first post so go easy on me :)

ADD REPLY
1
Entering edit mode

i agree with 5heikki's comment for filtering mapped PE reads if mapped means either both read and mate mapped or either mapped. From what I tested with pysam, this is their default definition of mapped in the fetch() function.

Edit: I changed my mind and realized that the command -f 4 -F 8 doesn't really generate useful alignments ( alignment record about the read itself and read itself being unmapped while its mate being mapped) so it's best to just use -F 4 to get mapped alignments for PE from sam/bam files. If you want to have only read1 mapped, use flag -f 64 -F 4, for read2, use flag -f 128 -F 4.

ADD REPLY
0
Entering edit mode

Does the below apply to single end sequencing data?

samtools view -b -F 4 -f 8 file.bam > onlyThisEndMapped.bam
samtools view -b -F 8 -f 4 file.bam > onlyThatEndMapped.bam
samtools view -b -F12 file.bam > bothEndsMapped.bam
samtools merge merged.bam onlyThisEndMapped.bam onlyThatEndMapped.bam bothEndsMapped.bam
ADD REPLY
1
Entering edit mode

I was reviewing this question and its answers. I tried your last suggestion in order to get the unique reads (a single read mapping at one best position) in my sam file. But I didn't get precisely the Unique mapped reads. I got things like this, instead :

SRR4293695.122264806    272 chr1    14671   1   29M *   0   0   GGGGGGAAAGGTGTCATGGAGCCCCCTAC   .F)<F7F)FFFFF)<)AFF7FFFF<AAAA   NH:i:3  HI:i:2  AS:i:26 nM:i:1
SRR4293695.122264806    272 chr9    14782   1   29M *   0   0   GGGGGGAAAGGTGTCATGGAGCCCCCTAC   .F)<F7F)FFFFF)<)AFF7FFFF<AAAA   NH:i:3  HI:i:3  AS:i:26 nM:i:1
SRR4293695.122264806    16  chr16   14356   1   29M *   0   0   GGGGGGAAAGGTGTCATGGAGCCCCCTAC   .F)<F7F)FFFFF)<)AFF7FFFF<AAAA   NH:i:3  HI:i:1  AS:i:26 nM:i:1

I am not sure if I have made something wrong. So probably, I should see instead the NH number as is suggested in : Efficiently extract alignments with NH:i:1 field.

ADD REPLY
1
Entering edit mode

samtools view -bq 1 file.bam > unique.bam This doesn't make sense at all. My best bet would be to compare XS with AS score.

ADD REPLY
0
Entering edit mode

So what would make sense?

ADD REPLY
0
Entering edit mode

After mapping via BWA, you will get a sam file. So do you mean, we need to convert the sam file to bam file and then using samtools view -b -F 4 file.bam > mapped.bam ?

ADD REPLY
2
Entering edit mode

You don't need to convert sam to bam file, its upto you as its just compressed version of sam, so saves some space.
Use this with sam,
samtools view -F 4 file.bam > mapped.bam

ADD REPLY
0
Entering edit mode

hey dude,

i am really thankful for your useful answer which helped me a lot

ADD REPLY
0
Entering edit mode

Hi,

Just a quick question, why does an alignment with a MAPQ score smaller than 1 mean a uniquely mapping read?

For example, I have the following read which is not unique/uniquely mapping (has the XS:i flag). But this has a MAPQ of 6 therefore would not be filtered out with this criteria? Any ideas?

HISEQ2500-09:128:H9FFTADXX:2:   163    scaffold_7359    909    6    100M    =    1189    380    TGATTATAAGGTTTTACATCACACATCTGAAAATTGATTGGGTACTTTCTTAAAAATTACATCATGATTATCATTTTTTTATTGTATGCAATTTTAATAA    @@@DDB?DDHH?DEBHGE??@FGHIIGGHEGDEECHEFGHII?BFBDFEHGHDCBAFGHIGIHIIGAEHCHIIIIIIIFEDCCCC&gt;@DCACCCCCDCCAC    AS:i:-6    XS:i:-5    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:71A28    YS:i:-16    YT:Z:CP
ADD REPLY
0
Entering edit mode

sorry, can I use samtools view -b -F 4 file.bam > mapped.bam for accepted_hits.bam produced by cufflinks for paired-end reads????

ADD REPLY
2
Entering edit mode
samtools view -b -f 2 file.bam > mappedPairs.bam
ADD REPLY
0
Entering edit mode

Thank you then by this command I can filter my accepted_hits.bam from tophat

samtools view -b -f 2 accepted_hits.bam > mappedPairs.bam
ADD REPLY
4
Entering edit mode
ADD REPLY
0
Entering edit mode

thank you so much

ADD REPLY
0
Entering edit mode

What if I only want to keep the mapped reads and remove all the unmapped? What should I do? Given that the unmapped is already deleted, to to convert the bam file into a fastq file then?

ADD REPLY
0
Entering edit mode

I tried to use this to remove the mapped rRNA from my samples, however in the output fastq files part of the header (1:N:0:AATAGCGGAG+CACTCAATT) is missing.

samtools view -h -b rRNAremoval.sam -o file.bam
samtools view -h -b -f 4 file.bam > unmapped.bam
samtools sort -n unmapped.bam -o sorted.bam
samtools fastq -@ 8 sorted.bam -1 R1.fastq -2 R2.fastq -0 /dev/null -s /dev/null -n

Would keeping unmapped reads or sorting somehow remove the header?

ADD REPLY
0
Entering edit mode

Hi Sukhi, Thanks for your answer. Do you know how to calculate the counts of the unmapped reads? For example, when trying using this command samtools view -f 4 file.bam > unmapped.sam.

ADD REPLY
16
Entering edit mode
10.0 years ago
rgiannico ▴ 190

I had the same issue but with Paired End Reads, and I solved using samtools and bamToFastq. You can find bamToFastq here.

  • If you need unmappedR1.fastq (containing both paired and unpaired R1 unmapped reads) and unmappedR2.fastq (containing both paired and unpaired R2 unmapped reads)

Use samtools -f 4 to extract all unmapped reads:

samtools view -b -f 4 file.bam > file_unmapped.bam
bamToFastq -bam file_unmapped.bam -fq1 unmappedR1.fastq -fq2 unmappedR2.fastq
  • If you need unmappedpairedR1.fastq (containing only paired R1 unmapped reads) and unmappedpairedR2.fastq (containing only paired R2 unmapped reads). Meaning you need all paired reads where at least one of them is unmapped.

Use samtools -F 2 to discard only reads mapped in proper pair:

samtools view -b -F 2 file.bam > file_unmapped.bam
bamToFastq -bam file_unmapped.bam -fq1 unmappedpairedR1.fastq -fq2 unmappedpairedR2.fastq
ADD COMMENT
0
Entering edit mode

I tried, but I couldn't install the hydra package, because I can't sudo. Is there a way to install it without root access, as other programs allow? (my server don't allow root access outside of the university)

Thanx!

ADD REPLY
0
Entering edit mode

Better to ask Aaron Quinlan, the developer. Try https://groups.google.com/forum/#!forum/bedtools-discuss

ADD REPLY
0
Entering edit mode

but with -f 4 only the mates are extracted that did not map, leaving out the part that did map ( -f 8). Or am I wrong? By this the information about pairing is lost, which is the valuable thing about paired-end datasets. There are cases, where one mate maps to one reference, but both do map to another.

ADD REPLY
9
Entering edit mode
9.4 years ago
Paul ★ 1.5k

Very good is from Broad Institute their web app for check all bitwise flags: https://broadinstitute.github.io/picard/explain-flags.html

ADD COMMENT
4
Entering edit mode
12.1 years ago
Lee Katz ★ 3.2k

bam2fastq is really helpful with this kind of question. Really, really helpful.

ADD COMMENT
2
Entering edit mode
6.3 years ago
PROCESSORS=10

Single_End_Layout:

samtools view --threads $PROCESSORS -b -F 4 in.bam > mapped.bam
samtools view --threads $PROCESSORS -b -f 4 in.bam > unmapped.bam

Paired_End_Layout

samtools view --threads $PROCESSORS -b -f 2 in.bam > mapped.bam
samtools view --threads $PROCESSORS -b -F 2 in.bam > unmapped.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6