So, I am trying to split a SAM file into many smaller SAM files -- by line number, where the line number is a multiple of 8.
I tried the following series of commands:
samtools view -H Mapped_Pila/LBG07/LBG07_pe.sam > Mapped_Pila/LBG07/LBG07_pe.header.sam && samtools view Mapped_Pila/LBG07/LBG07_pe.sam | split -l 8000000 - Mapped_Pila_Split/LBG07/LBG07_pe.split. && find -L . | grep Mapped_Pila_Split/LBG07/LBG07_pe.split. | parallel --gnu -j4 "echo Mapped_Pila/LBG07/LBG07_pe.header.sam {} >> {}.tmp && rm -f {}" && rename "s/\.tmp$/\.sam/" Mapped_Pila_Split/LBG07/LBG07_pe.split.*
Unpacking this:
1) get the header in a separate file
2) get the headerless SAM alignments and split them by a multiple of 8 (8000000)
3) find the split files from the original files, and for each of them, cat the header, then the headerless records into a new file
4) Remove the original split file
5) rename the tmp files to sam
This results in a SAM file that is parsed as truncated by samtools. So, I think I am missing something.
Generally, I want to split sam files that are very large into more tractable parts and then later recombine them with samtools merge.
YEP... I definitely did use echo instead of cat. Oops.