The most frequent output of a mapper is a SAM file, but in order to process it or visualize it, I have to have a index of a sorted BAM file. I usually work with a group of alignment files, so I use a pipeline like this:
for i in $(ls *.sam)
do
name=${i%.sam}
samtools view -Sb $name.sam -o $name.bam && samtools sort $name.bam $name.sort && samtools index $name.sort.bam $name.sort.bam.bai && rm $name.sam $name.bam
done
The problem with this kind of pipeline, is that I have to use a lot of space in the disk while the pipeline is running. I know that is a way to use temporal files in the pipeline to avoid the excessive use of disk space, can you tell me how can I use temporal files insted of write and delete intermediate files (like $name.sam $name.bam)?
You can pipe the sam output from the aligner directly to samtools, and then pipe from there into samtools sort. That will create the sorted bam on disk, without using a ton of extra disk space (there will be some extra temporary files from sort, but that's about it). Here's an example using bwa:
That's only sensible for the shortest of alignment tasks. If, for whatever reason, any of the steps fails you get no or erroneous output.
Plus, trying to debug problems in that nested pipe is a nightmare as you've no idea which step is erroring - esp. given samtools poor error messages. I moved to Picard shortly after trying to a debug a simpler samtools pipe.
That's only sensible for the shortest of alignment tasks. If, for whatever reason, any of the steps fails you get no or erroneous output.
Plus, trying to debug problems in that nested pipe is a nightmare as you've no idea which step is erroring - esp. given samtools poor error messages. I moved to Picard shortly after trying to a debug a simpler samtools pipe.
Ok, that is a nightmare , I'm gonna try picard, but I'm afraid that because is written in java is slower than samtools, isn't it?
nice!! the "-" flag is to use the standard input?
Yes. You could also use
/dev/stdout
.