Split SAM file into smaller SAM files by line count
3
2
Entering edit mode
8.3 years ago
cacampbell ▴ 60

So, I am trying to split a SAM file into many smaller SAM files -- by line number, where the line number is a multiple of 8.

I tried the following series of commands:

samtools view -H Mapped_Pila/LBG07/LBG07_pe.sam > Mapped_Pila/LBG07/LBG07_pe.header.sam && samtools view Mapped_Pila/LBG07/LBG07_pe.sam | split -l 8000000 - Mapped_Pila_Split/LBG07/LBG07_pe.split. && find -L . | grep Mapped_Pila_Split/LBG07/LBG07_pe.split. | parallel --gnu -j4 "echo Mapped_Pila/LBG07/LBG07_pe.header.sam {} >> {}.tmp && rm -f {}" && rename "s/\.tmp$/\.sam/" Mapped_Pila_Split/LBG07/LBG07_pe.split.*

Unpacking this:

1) get the header in a separate file

2) get the headerless SAM alignments and split them by a multiple of 8 (8000000)

3) find the split files from the original files, and for each of them, cat the header, then the headerless records into a new file

4) Remove the original split file

5) rename the tmp files to sam

This results in a SAM file that is parsed as truncated by samtools. So, I think I am missing something.

Generally, I want to split sam files that are very large into more tractable parts and then later recombine them with samtools merge.

SAM • 5.9k views
ADD COMMENT
2
Entering edit mode
8.3 years ago

You should add the header back to each of the splitted files.

something line cat Mapped_Pila/LBG07/LBG07_pe.header.sam split_file > split_file.sam

It should be

cat Mapped_Pila/LBG07/LBG07_pe.header.sam {} > {}.tmp && rm -f {}

in your command.

ADD COMMENT
0
Entering edit mode

YEP... I definitely did use echo instead of cat. Oops.

ADD REPLY
3
Entering edit mode
8.3 years ago
george.ry ★ 1.2k

I actually end up doing this with some regularity, so here's a smooth all-in-one method:

samtools view -H yourfile.bam > header

then

samtools view yourfile.bam | split - yourprefix -l 8000000 --filter='cat header - > $FILE.sam'

or

samtools view yourfile.bam | split - yourprefix -l 8000000 --filter='cat header - | samtools view -b - > $FILE.bam'
ADD COMMENT
0
Entering edit mode

Thanks, this does the trick -- I think my problem was that I stupidly used "echo" instead of "cat" -- so of course the file was invalid.

ADD REPLY
0
Entering edit mode
3.6 years ago

With python using pysam:

import pysam base_path_save=path_2_save samfile = pysam.AlignmentFile("sequences5/aligned.sam", "r") header1=samfile.header.to_dict() for read in samfile.fetch(): file_path=base_path_save+read.query_name+'.sam' with pysam.AlignmentFile(file_path, "w", header=header1) as outf: outf.write(read)

ADD COMMENT

Login before adding your answer.

Traffic: 2507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6