If I run bwa bwasw like this:
~/src/bwa/latest/bwa/bwa bwasw -z 1000 ~/refs/human_g1k_v37.fasta cluster-14263.walks
I get back the positive hits from some of the sequences, but also the empty hits for the sequences that don't map anywhere:
cluster-14263-walk-35 16 4 44630869 0 3S3M3D10M2I6M9D18M1I12M5I22M2I4M26S * 0 0 CAAGTACAGCTTTTCACTGAGGCATGGAATAAATATCCATTATGGCTCTGAGTGCCCCCAGAATGCACTGCGCACCAGCGCCATGTTTTAGCTCAAGCATAACAACTCCTGACC * AS:i:31 XS:i:31 XF:i:1 XE:i:0 XN:i:0 cluster-14263-walk-36 4 * 0 0 * * 0 0 GGTCAGGAGTTGTTATGCTTGAGCTAAAACATGGCGCTGGTGCGCAGTGCATTCTGGGGGCACTCAGAGCCATAATGGATATTTATTCCATGCCTCAGTGGAAG * cluster-14263-walk-37 4 * 0 0 * * 0 0 GGTCAGGAGTTGTTATGCTTGAGCTAAAACATGGCGCTGGTGCGCAGTGCATTCTGGGGGCAT *
What is the cleanest/recommended way of saving only the hits and not the sequences that have no hit?
Sorry if this has been answered before, I couldn't find an answer other than grepping on the resulting files.
Careful, MAPQ == 0 indicates the read could not be mapped unambiguously to the ref genome.
~/bin/bwa bwasw -z 1000 $ref $reads | ~/bin/samtools view -F 0x4 -bt $ref.fai - > file.bam # Perfect! Thanks Aaron!
Good point drio. If you want to do "OR" logic such as this (-F 0x4 || MAPQ != 0), I would recommend bamtools' JSON "rules" files.