Question

Different filter settings to different files in same run of samtools

1

Entering edit mode

10.3 years ago

Endre Bakken Stovner ▴ 970

I want to split my bam file into paired aligned reads and unpaired aligned reads for a much easier downstream analysis.

1) Is this possible to do in one run through the file with samtools?

2) If this is not possible with samtools directly, is it possible with the pysam wrapper library?

If not, perhaps just running the two filters in parallel should reduce IO due to caching...

Ps. I want the pairs in the paired file to be on contiguous lines, so that line 0 and 1 contain the first pair, line 2 and 3 contain the second pair and so on.

3) Is sorting the file by name enough to ensure that paired reads end up next to each other?

Sorry for these possibly dumb questions, I am a complete samtools/paired end reads newb.

samtools • 2.7k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Endre Bakken Stovner ▴ 970

2

Entering edit mode

10.3 years ago

Devon Ryan 105k

No, it's not possible
This is absolutely possible and what I'd personally do. Pysam is convenient enough to use if you're familiar with python.

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

I'll look into pysam then, thanks! Upvoted your answer, but I'll accept the first answer that includes a small snippet of code showing how to filter the file into paired and unpaired. I forgot to add that I want the pairs in the paired file to be contiguous, so that line 0 and 1 contain the first pair, line 2 and 3 contain the second pair and so on. I'll update my q.

ADD REPLY • link 10.3 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

Is your input file already sorted such that pairs are next to each other? If so, the output you want is already what you'll get (more or less, you might get read2 before read1, but that's also easy to deal with).

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

I get pairs next to each other by sorting by names, right? With possible unpaireds intercalated between paired reads.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

Yes and that's also what most aligners will directly output (at least unless you tell them otherwise).

For coordinate sorted files, it's not that difficult to just store an alignment in a dict() until its mate comes along.

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for all your kind help.

ADD REPLY • link 10.3 years ago by Endre Bakken Stovner ▴ 970

Ram · Accepted Answer · 2015-04-17

3

Entering edit mode

10.3 years ago

Pierre Lindenbaum 166k

samtools view -f 4 -o umapped.bam -U mapped.bam in.bam

http://www.htslib.org/doc/samtools-1.1.html

"-U FILE Write alignments that are not selected by the various filter options to FILE. When this option is used, all alignments (or all alignments intersecting the regions specified) are written to either the output file or this file, but never both. "

ADD COMMENT • link 10.3 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

Wow, how have I missed the -U option!

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

Nice to know about, but would it work exactly like I wanted? Filtering for those that are paired would output both those that are unpaired but also unaligned to the U file I guess. Still, thanks.

Edit: no it would work with a tweak: I can pipe the U file to standard out and then filter it again for unmapped reads with a pipe.

ADD REPLY • link 10.3 years ago by Endre Bakken Stovner ▴ 970

1

Entering edit mode

Just pipe things if you want to also filter unmapped reads: samtools view -bF 4 foo.bam | samtools view -bF 8 -o singletons.bam -U paired.bam -, or something like that.

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

that would be

samtools view -bu -F 4 in.bam | samtools view -f 1 -o paired.bam -U unpaired.bam -

ADD REPLY • link 10.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

depends with you mean with paired 'paired-read=1' 'properly paired=8'

ADD REPLY • link 10.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I actually went for "mate unmapped", to keep discordant alignments with the properly paired.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

8 is mate unmapped, 2 is PROPER_PAIR, afaics.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

I guess it's a question of whether "paired in sequence" or "both mates of a pair aligned" is needed.

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

I guess I was being unclear, but 8 was what I intended. I can't see what 1 should mean unless paired/unpaired reads sometimes are mixed in fastq files before alignment.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Endre Bakken Stovner ▴ 970