This is my read file (containing 1 read):
@my_read
AACGCCGATCTCAGGACCAAAAAGGGGCATACCGAGACTACAGCACGAGATCTTACAACAATGGTAGTGTTCTGCGTAGATTCGTAAATTAAGATGATAACCTCTCGCATCCCTGTTTTATCTATGAATCGCTTCATACAGCCAAATGGCAGCTGCTCTGGATTTTGGTC
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
These are the first 50 lines from my reference file:
>my_ref
TAACTCTCGAAAACACGATGCAGACCAAAATCCAGAGCAGCTGCCATTTGGCTGTATGAA
GCGATTCATAGATAAAACAGGGATGCGAGAGGTTACCATCTTAATTTACGGATCTACGCA
GAACACTACCATTGTTGTAAGATCTCGTGCTGTAGTCTCGGTATGCCCCTTTTTGGTCCT
GAGATCGGCGTCCCAGACCCCCTCCCTGATGTCGTCAATATGGTTTTTATTTGGACCGGC
TGGGAATGCCAGCGAGGACAAGTGTACTCAGCTCATCAAATAATACTCAATGCACACTCG
TGATGCTAGGCCGCGCAGGGTACACTACATCGTCGGCGTGCACAATCGCTTCGGGATAGC
CGCTGAGATATATCGAGACGTCGGTGGGATGTAGGTTGCTCATAGGCGGAGCTTGGTCAA
TCCCGGTATTAAAGATCGTAATTTCTCGTCGAAGCAGTCAAGGGCCTTATGTCCTGTACG
GAGATCGATAGCAGACAAAAGCGTGGCGCAGCCTGGATACCGTCGACGTCGAGAAGCTTA
GATTTTGTAGTGGGTTTTAACATACATACCATAGGAAACGTTGTTTTGCGGTGAGGTCTA
GAATAGGGCTCCTGTGTGTCACCGCGACTATGTTTGCACGGCTTATCATGACCTTGTACG
CTGCCGAGTTCTTGTCCTGAGCGTCGGACCCATGCTGTGCTAGTCGGCACATATCAGTTC
GATTCGGTGATGGACTGGAATTTGCGGTGGCGTGCAATAAGTCCATCACACTGACCGTTA
TTTTTGCAGGCTCAGATAGGGCCTGGGTGCGTAGCAAAGCTTTTCGCCAGCATCGGGTGT
GTCTAGCAAGTGAAGAGGAGCTGCTATCACTTTCTCTTAGGTAATCGCATGTAGGAGATA
TACTGCTATACCCAGTGCTCTTGTACCGGATGTACTCGGGTGGTTCCTGCTCACGTTGGT
GTTAAACCCGTTGGTCTGGTCGGATTACCCACGAATCCGACAACCCGCTTCTGACGTGAT
GTGTGGATGCTACTCATCCAAAGCTATGATCCGTGATCTCAAAATGTAGGCCTCTGCACT
CTAGTTACGGTGTTATGTAGCGGACACAGCCTAAGAATGGTACGCAGATCGACGTTGTCG
TCTTGGTGACTCGTTTCGGGTGACGCTAGCAAATCGCGATGAAACGATGAATACGATACG
TAGTCCTAAGCGACCTACAACGGTGGATTTGACTGCCACAACGCGAGAAGACTCGGGGCA
TTTCCCCCCGGCGGCCTCCTTGAGATAGCCGTGGGGAGCTAATTTTAGTCTTCGTAAATC
GTTACAAAAATGACAATCCAAGGACATGTGGTTCATTGGAGCGATCATGATCTATATGAG
TAGTTACTCGGCGCACGATCCAGCTATGAGCTACCGGCTTAGGGGGTACTACCCGGACTT
AGGCATGAATACCAAACACCTGTCGTTTGACATTCTTTACCACGGACTGAGGGGGGTCTG
AGGGACCCATCTGCCCATCTGCGCGCTCACTCAAGCTCCAAGAGTTTCTTGTACGTACGG
CGTCATTTACTGAACGCTTCCTAATTGAAGATGAAGATTTGCGGATTCCTCGCTCGGTGA
AATCCTAACCAGTGTATGCTTTGTAGCGTACTGACCCGAAGAAATAAATGAGGTTAGCAA
CAGCAGCAACCAAGACACCCTTAAGTGGATCCCTCCCGGAACTCTGATCCAGTCACTGCA
GCAGACACTTTGTGCAGCCTGGCCCGTCATATATCCATAGAAGCCTGCAACATCACAAGC
CGGAATCAGCTCGCTGGCTCCTACGTCATGTGAGATCACACGTTCGCTACATGGATTGGG
CATTAAGCTCAAACTGAAGTACACTGCGGCTCTTTTCGTTAACTCGTGGTGCAATCACGC
ACTAACACGAGAGATGGACGGTACAGATACCGCTTAGTTCAAGAACATTCCACGCGTGCC
ACAAATCTCTTACATTATCCAAGGTAGCTTCTGCGGAATCACAAATTACGAGCTTGAATC
AACTGTCGCCCTATCAGCGGTTCCTCCGTATGACCCCGTTGTACGCTACATGTCTGAGTT
CCTGGGGTGAAAGCTCTCAATAGGATGAAGGTACGTACCCCGAGGGCGTGCACTTCTATC
TCCAGGGAAGGCCGCCGCTTCCCCGGAGTGAATCCATTATTCTGGCGGGGTCCAGGTTAA
CTCAGTCTTGAAAAGGTGCCTGGGAATGGTCGACTAGAGTTGCCCGATCGTTCCGCCTAT
CATTGGAACACTTATCCGTACATTAGCTTTAGGTTAACGGGGTAGATTGGGGGTCCCGTA
TCCTGAAGTTTAGGGGGAATTTTCAACCTATTCAACCGCTTGGTATATTTACGTAGGTCT
TGCGACCGCCGGGACGCCCTCAGGCTCCACATACTTCCCGCCACGTTGATGGGAGATTGT
TAACCAGGATAGATACAGAACCGTCGTCTCGGACATTCCAAACCTTTTCTGAGCCTCGCG
AGCAATACATACCGCAAGCACATACTGTTTAGTATTGTCTTAAGCGGAATGGATATACAC
TCCCAAGCCTTGCACAGGCGGTTGGTGTTATCGTCCGACGACGCAAATTATAAAACTTGA
GCCTTACCGAAGGCTCGTCTAGAGCAAATAGCCAGCTGATTTTAACTGGTCACGACGGTT
GCTTATGCACGGGATGTAGCTCCAGACCCTATTGCTTGGCGGACAACAATCGGCACCCTG
AGTCAACAAGTTGGGCTCTCATGCACAATCTATCGAGGCTACGACGATATTGTAAGGCAT
TTATCTCTAATAGTGAACATTAAAAGGCAAGTCTGGGTACGGTCGAAGGTCACAAAAGAA
CGGTAGATGCAGGGGCGCCGGGCCAGGGACTCGGTCACTAGTGGACCGATATTGTCGATT
If you now map that one read against the reference with this command:
bwa mem <ref_file> <read_file>
Then it will map at position 23.
The CIGAR string will be 170M.
Comparing the sequence of the reference with the read sequence I notice that the identity between the 2 sequences is only 31%. But bwa says it maps at position 23 and the mapping quality is 60. Why does it do that?
Hm okay but how can I force bwa mem to only output "really good" alignments? I cannot see a parameter like "minimum base identity"?
I have found this thread from 2015: http://sourceforge.net/p/bio-bwa/mailman/message/34275912/
The developer suggests to use the -T parameter (alignment score parameter).
But that's not the same as saying "I want reported alignments to have at least 90% identity" ...
You definition of "really good" is very different from the standard. Most people want, "likely to be correct", whereas you're asking for "low edit distance" (if such an alignment existed, it would have been reported instead). I suspect you'll have to post-process the alignments for what you want.