Pipe from BWA to sambamba
1
1
Entering edit mode
7.5 years ago

I am trying to pipe the output from BWA to sambamba to sort and index the sam files. I have 20 files with reads from sequencing (pair end) and want to have the resulting bam file (not the intermediate sam or bam files). This is the code I have at the minute:

for filename in ./seqtk_1/subsample_1/*_1.fq.gz;
do file=`echo $filename|sed 's/_1.fq.gz//'`;
filenopath=`basename $file`;
outputpath=./BWA/seqtk_1/subsample_1;
bwa mem -v 3 ./combine_reference.fa.gz ${file}_1.fq.gz ${file}_2.fq.gz > ${outputpath}/align_${filenopath}_BWA.sam |
sambamba view -S -f bam - > ${outputpath}/align_${filenopath}_BWA.bam |
sambamba sort -o - > ${outputpath}/sorted_${filenopath}_BWA.bam |
sambamba index - > {outputpath}/indexed_${filenopath}_BWA.bam;
done

This is the output:

-bash: {outputpath}/indexed_sub_NC_001539_BWA.bam: No such file or directory
sambamba-view: Unrecognized option -
sambamba-sort: Cannot open or create file '' : No such file or directory
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 100000 sequences (10000000 bp)...
[M::process] read 100000 sequences (10000000 bp)...

That continues through the rest of the files. I get a sam file and a sorted_${filenopath}_BWA.bam file but the bam file isnt populated.

My thinking is that the code isn't read/completed linearly and it is trying to create files that can't be created because BWA hasn't started running yet.

Is there a way to fix this? Or do I just need to run BWA and sambamba separately? I don't want to keep these sam files because the size is too large.

Thanks in advance

BWA Sambamba linux pipe • 7.4k views
ADD COMMENT
3
Entering edit mode
7.5 years ago

I don't know sambamba, so I don't know if you can use it to read from stdin, but I'll assume it does.

Your concept of pipes is wrong. Normally you would do the following

command1 input > output1
command2 output1 > output2

With pipes you would do

command1 input | command2 > output2

So you don't send the output of bwa to a sam file, but directly to sambamba.

ADD COMMENT
1
Entering edit mode

Thank you, so the pipe should look more like this:

for filename in ./seqtk_1/subsample_1/*_1.fq.gz;
do file=`echo $filename|sed 's/_1.fq.gz//'`;
filenopath=`basename $file`;
outputpath=./BWA/seqtk_1/subsample_1;
bwa mem -v 3 ./combine_reference.fa.gz ${file}_1.fq.gz ${file}_2.fq.gz > ${outputpath}/align_${filenopath}_BWA.sam |
sambamba view -S -f bam > ${outputpath}/align_${filenopath}_BWA.bam |
sambamba sort -o > ${outputpath}/sorted_${filenopath}_BWA.bam |
sambamba index  > {outputpath}/indexed_${filenopath}_BWA.bam;
done

I am completely new to bash

ADD REPLY
1
Entering edit mode

No, the pipe should look like this:

bwa mem -v 3 ./combine_reference.fa.gz ${file}_1.fq.gz ${file}_2.fq.gz  | sambamba view -S -f bam  | sambamba sort -o > ${outputpath}/sorted_${filenopath}_BWA.bam

The concept of pipes is not generating intermediate files. So don't do that. Leave out the > redirection symbol, and stream | directly to the next command. Using > and | is mutually exclusive.

Piping into index won't work. You'll need to do that separately.

Note again that I don't know sambamba but I'm just applying linux logic. Your mileage may vary. Somehow you'll have to tell sambamba that it should expect input on stdin. You'll have to dive in the documentation for that.

For samtools it would be like this:

bwa mem ref.fa reads.fastq | samtools sort -O BAM -o alignment.bam -

Note the final - which tells samtools that input is on stdin and not a file.

ADD REPLY
1
Entering edit mode

Thank you! That makes alot of sense

ADD REPLY
1
Entering edit mode

That doesn't work

Comes up with this:

Usage: sambamba-view [options] <input.bam | input.sam> [region1 [...]]

Options: -F, --filter=FILTER
                    set custom filter for alignments
         -f, --format=sam|bam|cram|json
                    specify which format to use for output (default is SAM)
         -h, --with-header
                    print header before reads (always done for BAM output)
         -H, --header
                    output only header to stdout (if format=bam, the header is printed as SAM)
         -I, --reference-info
                    output to stdout only reference names and lengths in JSON
         -L, --regions=FILENAME
                    output only reads overlapping one of regions from the BED file
         -c, --count
                    output to stdout only count of matching records, hHI are ignored
         -v, --valid
                    output only valid alignments
         -S, --sam-input
                    specify that input is in SAM format
         -C, --cram-input
                    specify that input is in CRAM format
         -T, --ref-filename=FASTA
                    specify reference for writing CRAM
         -p, --show-progress
                    show progressbar in STDERR (works only for BAM files with no regions specified)
         -l, --compression-level
                    specify compression level (from 0 to 9, works only for BAM output)
         -o, --output-filename
                    specify output filename
         -t, --nthreads=NTHREADS
                    maximum number of threads to use
         -s, --subsample=FRACTION
                    subsample reads (read pairs)
         --subsampling-seed=SEED
                    set seed for subsampling
sambamba-sort: Missing value for argument -o.
ADD REPLY
2
Entering edit mode

Apparently (I told you I didn't know about sambamba and you had to look into the documentation) you can use /dev/stdin and /dev/stdout to read from pipes and stream to pipes.

ADD REPLY
1
Entering edit mode

Okay, thank you for the help. I really appreciate it.

ADD REPLY
1
Entering edit mode

Glad I could help. If my answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode

Here is how to send stdin to sambamba.

@wouter, Oops! same link :P

ADD REPLY

Login before adding your answer.

Traffic: 2823 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6