Different filter settings to different files in same run of samtools
2
1
Entering edit mode
9.6 years ago

I want to split my bam file into paired aligned reads and unpaired aligned reads for a much easier downstream analysis.

1) Is this possible to do in one run through the file with samtools?

2) If this is not possible with samtools directly, is it possible with the pysam wrapper library?

If not, perhaps just running the two filters in parallel should reduce IO due to caching...

Ps. I want the pairs in the paired file to be on contiguous lines, so that line 0 and 1 contain the first pair, line 2 and 3 contain the second pair and so on.

3) Is sorting the file by name enough to ensure that paired reads end up next to each other?

Sorry for these possibly dumb questions, I am a complete samtools/paired end reads newb.

samtools • 2.3k views
ADD COMMENT
3
Entering edit mode
9.6 years ago
samtools view -f 4 -o umapped.bam -U mapped.bam in.bam

http://www.htslib.org/doc/samtools-1.1.html

"-U FILE Write alignments that are not selected by the various filter options to FILE. When this option is used, all alignments (or all alignments intersecting the regions specified) are written to either the output file or this file, but never both. "

ADD COMMENT
1
Entering edit mode

Wow, how have I missed the -U option!

ADD REPLY
0
Entering edit mode

Nice to know about, but would it work exactly like I wanted? Filtering for those that are paired would output both those that are unpaired but also unaligned to the U file I guess. Still, thanks.

Edit: no it would work with a tweak: I can pipe the U file to standard out and then filter it again for unmapped reads with a pipe.

ADD REPLY
1
Entering edit mode

Just pipe things if you want to also filter unmapped reads: samtools view -bF 4 foo.bam | samtools view -bF 8 -o singletons.bam -U paired.bam -, or something like that.

ADD REPLY
0
Entering edit mode

that would be

samtools view -bu -F 4 in.bam | samtools view -f 1 -o paired.bam -U unpaired.bam -
ADD REPLY
0
Entering edit mode

depends with you mean with paired 'paired-read=1' 'properly paired=8'

ADD REPLY
0
Entering edit mode

I actually went for "mate unmapped", to keep discordant alignments with the properly paired.

ADD REPLY
0
Entering edit mode

8 is mate unmapped, 2 is PROPER_PAIR, afaics.

ADD REPLY
0
Entering edit mode

I guess it's a question of whether "paired in sequence" or "both mates of a pair aligned" is needed.

ADD REPLY
0
Entering edit mode

I guess I was being unclear, but 8 was what I intended. I can't see what 1 should mean unless paired/unpaired reads sometimes are mixed in fastq files before alignment.

ADD REPLY
2
Entering edit mode
9.6 years ago
  1. No, it's not possible
  2. This is absolutely possible and what I'd personally do. Pysam is convenient enough to use if you're familiar with python.
ADD COMMENT
0
Entering edit mode

I'll look into pysam then, thanks! Upvoted your answer, but I'll accept the first answer that includes a small snippet of code showing how to filter the file into paired and unpaired. I forgot to add that I want the pairs in the paired file to be contiguous, so that line 0 and 1 contain the first pair, line 2 and 3 contain the second pair and so on. I'll update my q.

ADD REPLY
0
Entering edit mode

Is your input file already sorted such that pairs are next to each other? If so, the output you want is already what you'll get (more or less, you might get read2 before read1, but that's also easy to deal with).

ADD REPLY
0
Entering edit mode

I get pairs next to each other by sorting by names, right? With possible unpaireds intercalated between paired reads.

ADD REPLY
0
Entering edit mode

Yes and that's also what most aligners will directly output (at least unless you tell them otherwise).

For coordinate sorted files, it's not that difficult to just store an alignment in a dict() until its mate comes along.

ADD REPLY
0
Entering edit mode

Thanks for all your kind help.

ADD REPLY

Login before adding your answer.

Traffic: 1375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6