Unfortunately only the first command seems to take effect.How can I fix this? Thank you.
(BTW, if there's an easier way to accomplish what I want, I'm open to any suggestion, although I would still like to understand why the above doesn't work.) Thank you.
Edit: I've found out why this doesn't work. A binary cannot be appended. Looks like I'll have to use sam output and convert back to bam.
aside from the proper result you want to obtain, the rationale of the problem is wrong as you are trying to concatenate binary bam files as if they were plain text files. what you should do is to perform the 3 filters independently, each one generating its own bam file, and then perform a final step that would merge those 3 files into a single one:
Edit: I just saw Pierre's comment above describing a oneliner that would do the work which looks much more elegant and efficient than this answer. I would definitely go for it. I'll leave this answer as I think that it highlights the underlying problem, although the best solution would be Pierre's (samtools view -f2 -h 1.bam ; samtools view -F4 1.bam; samtools -f8 1.bam) | samtools view -Sb -o merged.bam -
option -F and -f use a logical OR, not a logical AND.
So -F 60 is exclude: read unmapped OR mate unmapped OR read reverse strand OR mate reverse strand. With those last two flags, you're going to remove most of your reads even if they're properly mapped.
if you need a fine filtering for your reads you can use my tool: SamJS
I understand that. I'm actually trying to exclude the properly mapped reads. The last two commands are meant to add things that are not covered by the first, and they do according to the results I got from view -c (12, 35, and 64219 respectively for the three queries). The problem is that they are not adding anything when I try to output to file with the >> syntax. So I got only 12 reads in my file instead of 12+35+64219 expected. The problem here, I believe, is my unix syntax, not samtools.
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.1 years ago by
bongbang
▴
90
1
Entering edit mode
Oh, I see. you need to use a simple redirection '>' and merge the new files using samtools merge
Many thanks. If you edit your answer to include this, I'll accept it. It'll make it easier for future readers with the same question.
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.1 years ago by
bongbang
▴
90
1
Entering edit mode
FYI: each BAM file is ended with a special EOF block, so tools won't continue processing file once it's encountered, that's why tools report only 12 reads.
Actually correctly-written software should continue over embedded EOF blocks, exactly so that more BAM records could be appended to a BAM file (see ยง4.1.2 of the SAM spec). This didn't work for the O.P. because adding more BAM headers in the middle of the file won't work, and because some software has bugs in this admittedly esoteric area (including samtools at present).
I understand that. I'm actually trying to exclude the properly mapped reads. The last two commands are meant to add things that are not covered by the first, and they do according to the results I got from
view -c
(12, 35, and 64219 respectively for the three queries). The problem is that they are not adding anything when I try to output to file with the >> syntax. So I got only 12 reads in my file instead of 12+35+64219 expected. The problem here, I believe, is my unix syntax, not samtools.Oh, I see. you need to use a simple redirection '>' and merge the new files using samtools merge
Do you think I can use piping to obviate the need for temporary files?
yes but without using a binary format and the only first BAM needs to print the header (option
-h
). Something like:Many thanks. If you edit your answer to include this, I'll accept it. It'll make it easier for future readers with the same question.
FYI: each BAM file is ended with a special EOF block, so tools won't continue processing file once it's encountered, that's why tools report only 12 reads.
Actually correctly-written software should continue over embedded EOF blocks, exactly so that more BAM records could be appended to a BAM file (see ยง4.1.2 of the SAM spec). This didn't work for the O.P. because adding more BAM headers in the middle of the file won't work, and because some software has bugs in this admittedly esoteric area (including samtools at present).