Entering edit mode
6.2 years ago
max
▴
60
(Originally from Milan Simonovic posted on sourceforge but I could reproduce it and have the same problem, https://sourceforge.net/p/bio-bwa/mailman/message/36190558/)
We have BWA at full sensitivity here and it still cannot find a very obvious sequence in zebrafish chr10. Any ideas what we're missing?
# cat reads.fa
> testSeq
TTTATTTCCACACTTCATGG
Summary of alignable sequences on chr10:
TTTATTTCCACACTTCATGG chr10:10367145-10367165 + this is the test sequence
TTTATTTCCACATATGATGG chr10:25378199-25378221 - this is the sequence that was not found
TTTATTTCCACA**T*ATGG
steps to reproduce:
wget -O chr10.fa.gz ftp://ftp.ensembl.org/pub/release-85/fasta/danio_rerio/dna/Danio_rerio.GRCz10.dna.chromosome.10.fa.gz
$ ./bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
..
$ ./bwa index chr10.fa.gz
$ ./bwa aln -o 0 -n 4 -k 4 -N -l 20 chr10.fa.gz reads.txt > reads.sai
$ ./bwa samse -n 4 chr10.fa.gz reads.sai reads.txt
# returns just the exact match
[bwa_aln_core] print alignments... 0 10 10367146 2 20M * 0 0 TTTATTTCCACACTTCATGG * XT:A:U NM:i:0 X0:i:1 X1:i:135 XM:i:0 XO:i:0 XG:i:0 MD:Z:20