Hi, You get a bam (machine readable sam) file after mapping, and it contains information about mapped and unmapped reads.
To get the unmapped reads from a bam file use:
samtools view -f 4 file.bam > unmapped.sam
the output will be in sam
to get the output in bam, use:
samtools view -b -f 4 file.bam > unmapped.bam
To get only the mapped reads use the parameter F
, which works like -v
of grep
and skips the alignments for a specific flag.
samtools view -b -F 4 file.bam > mapped.bam
From the manual; there are different int codes you can use with the parameter f
, based on what you want:
-f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]
Each bit in the FLAG field is defined as:
Flag Chr Description
0x0001 p the read is paired in sequencing
0x0002 P the read is mapped in a proper pair
0x0004 u the query sequence itself is unmapped
0x0008 U the mate is unmapped
0x0010 r strand of the query (1 for reverse)
0x0020 R strand of the mate
0x0040 1 the read is the first read in a pair
0x0080 2 the read is the second read in a pair
0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate
Like for getting the unique reads (a single read mapping at one best position); I use:
-q INT Skip alignments with MAPQ smaller than INT [0]
samtools view -bq 1 file.bam > unique.bam
HTH
hello can anybody tell me What different between these two command-line?
1- samtools view -bh -f 4 -F 264 sample.bam > mapped.bam
2-samtools view -bh -f 8 -F 260 sample.bam > mapped.bam
thanks Mahdi