I tried to find a short oligonucleotide sequence (probe) in a transcript and I knew for sure that the transcript contained the probe. But the latest version of the BLAST stand-alone algorithm (2.2.25+) found no match for the probe. Surprisingly enough, when I split the probe sequence in two parts both were found in the transcript one after the other. Moreover, when I deleted the two first nucleotides from the probe sequence, BLAST managed to find the correct matching. Could anyone explain what kind of problem I am facing? I do have about 20 such probe sequences that were not found by BLAST even if there was a perfect matching.
I tried to find matching with the following parameters:
blastn -query probe.fa -db target -task blastn-short -word_size 7 -evalue 100 -out res.out
UPDATE: It was really helpful to change the -wordsize parameter to 5. The BLASTN algorithm managed to find the correct matching. BUT there are still several probes, for which it fails to find the correct matching although the transcript contains the probe sequence for sure. The stand-alone BLAST version allows to set the -wordsize parameter >=4, but even with -word_size=4 the matching couldn't be found. The online BLAST finds the matching. What should I do in this case?
The new problem data is:
>probe_seq
CCCCCCCCTCGGAGAGAGAGAGA
>transcript_seq
tccctctcccccccttctctctctctccgaggggggggggtcccagggagggaggggggg tcccccgatcagcatgtggctcctggcgctgtgtctggtggggctggcgggggctcaacg cgggggagggggtcccggcggcggcgccccgggcggccccggcctgggcctcggcagcct cggcgaggagcgcttcccggtggtgaacacggcctacgggcgagtgcgcggtgtgcggcg cgagctcaacaacgagatcctgggccccgtcgtgcagttcttgggcgtgccctacgccac gccgcccctgggcgcccgccgcttccagccgcctgaggcgcccgcctcgtggcccggcgt gcgcaacgccaccaccctgccgcccgcctgcccgcagaacctgcacggggcgctgcccgc catcatgctgcctgtgtggttcaccgacaacttggaggcggccgccacctacgtgcagaa ccagagcgaggactgcctgtacctcaacctctacgtgcccaccgaggacggtccgctcac aaaaaaacgtgacgaggcgacgctcaatccgccagacacagatatccgtgaccctgggaa gaagcctgtgatgctgtttctccatggcggctcctacatggaggggaccggaaacatgtt cgatggctcagtcctggctgcctatggcaacgtcattgtagccacgctcaactaccgtct tggggtgctcggttttctcagcaccggggaccaggctgcaaaaggcaactatgggctcct ggaccagatccaggccctgcgctggctcagtgaaaacatcgcccactttgggggcgaccc cgagcgtatcaccatctttggttccggggcaggggcctcctgcgtcaaccttctgatcct ctcccaccattcagaagggctgttccagaaggccatcgcccagagtggcaccgccatttc cagctggtctgtcaactaccagccgctcaagtacacgcggctgctggcagccaaggtggg ctgtgaccgagaggacagcgctgaagctgtggagtgtctgcgccggaagccctcccggga gctggtggaccaggacgtgcagcctgcccgctaccacatcgcctttgggcccgtggtgga tggcgacgtggtccccgatgaccctgagatcctcatgcagcagggagaattcctcaacta cgacatgctcatcggcgtcaaccagggagagggcctcaagttcgtggaggactctgcaga gagcgaggacggtgtgtctgccagcgcctttgacttcactgtctccaactttgtggacaa cctgtatggctacccggaaggcaaggatgtgcttcgggagaccatcaagtttatgtacac agactgggccgaccgggacaatggcgaaatgcgccgcaaaaccctgctggcgctctttac tgaccaccaatgggtggcaccagctgtggccactgccaagctgcacgccgactaccagtc tcccgtctacttttacaccttctaccaccactgccaggcggagggccggcctgagtgggc agatgcggcgcacggggatgaactgccctatgtctttggcgtgcccatggtgggtgccac cgacctcttcccctgtaacttctccaagaatgacgtcatgctcagtgccgtggtcatgac ctactggaccaacttcgccaagactggggaccccaaccagccggtgccgcaggataccaa gttcatccacaccaagcccaatcgcttcgaggaggtggtgtggagcaaattcaacagcaa ggagaagcagtatctgcacataggcctgaagccacgcgtgcgtgacaactaccgcgccaa caaggtggccttctggctggagctcgtgccccacctgcacaacctgcacacggagctctt caccaccaccacgcgcctgcctccctacgccacgcgctggccgcctcgtccccccgctgg cgccccgggcacacgccggcccccgccgcctgccaccctgcctcccgagcccgagcccga gcccggcccaagggcctatgaccgcttccccggggactcacgggactactccacggagct gagcgtcaccgtggccgtgggtgcctccctcctcttcctcaacatcctggcctttgctgc cctctactacaagcgggaccggcggcaggagctgcggtgcaggcggcttagcccacctgg cggctcaggctctggcgtgcctggtgggggccccctgctccccgccgcgggccgtgagct gccaccagaggaggagctggtgtcactgcagctgaagcggggtggtggcgtcggggcgga ccctgccgaggctctgcgccctgcctgcccgcccgactacaccctggccctgcgccgggc accggacgatgtgcctctcttggcccccggggccctgaccctgctgcccagtggcctggg gccaccgccacccccaccgcccccctcccttcatcccttcgggcccttccccccgccccc tcccaccgccaccagccacaacaacacgctaccccacccccactccaccactcgggtata gggggtgggtggggaggccctcctccccggccctccctggcccggccactccgaaggcag ggaggaggacttggcaactggcttttctcctgtggagtcgtcacacgccatccagcagcg ctaaggtggacatgggattcctccctgcgatgcgtgtctttcccacgcagagaagcccag tctcttctctggatctgggcctttgaacaactggggggcgttttctcccccccattggga caccagtcttcggtgtgtggaatgtggtattttcccgcgtggaggtgtgctttctcacaa cggggtgtgttttcccatgtgcagggtgaggtttttttttgccaccctggacacatgttg gccccctcaaagaatttctgtggggatttgtaccccagaatcctgttcccccatcccttc tcccacctcctcccctctccctccccctggagaccctggaagtggtgtgttcacatacag tgacccttggccaccagaccacagaggatggagcctgggaagcagcgaggaaatcacagc cccctcgcccctgcctcccttgcccctaccccggcgaagcatgttccccccgacgccccc cttggcacaagtcagatgaagcacgttctgccggggaggccctcaccttccagagaggac agacacagatttcctgctgggggagggaggagtccacgcatcctgatgctgcctggaagc ttattttcccgtggccaggacgcatttctctgagtggaaacaggttcttgcatgtggatg tgtgtttccccaggcagacggcccctctcttcccagcacttccctgcctcccccaggcct caggcccagcacccagttcctcctcacatggcaggtgagcacagacttctagttggcagg agctgaggagggtgaacaaaccccgagggaggcccggcccttgctcccgagttgggggga gggggtgtggcaacgtgccccccgcagaggccacgcatgtttgaccaaagccctcattgt ggtccgaggacagccttttccccaggcctcagagcattgctcatccgtgccaaactgggt aggtggatttgagcggaaagactcccaaaatgtgccaagaatttcccagtcccaggcagg gcaggggaaactaagggcaagcaggatacagggcgagggatgtggcaggtgagggggctc ccgcctgtgccccttctcctcaccatgtctcccccaccctgcctcagttctccgttcccc ttcatctccgtccccctctttgaagctgtccccatctcagtgtcagaccagccttctcct cagctgaccaccctcctctgacccacgccccctccttgtctgaaagaaaggagccttgaa tggtggagggaggcagtggggagaaaggtctcaccggacaggttgggagaatgaggtcag cggtgctggggaacagatggagggggcagtggggacagggcttgggcagacaccagcagg aataatttgaaatgtgtgaggtgactccccggagggccttgggcttgggcatttgggaaa agaatgatgtctggaagggcttaagggacacagtggacgaggggagagtcctcatctgct ggcattttgtggggtgttagtgccaaacttgaataggggctggggtgctgtcttccactg acacccaaatccagaatccctggtcttgagtccccagaactttgcctcttgactgtccct tctcttcctacctccatccatggaaaattagttattttctgatcctttcccctgcctggt ctagctcctctccaaacagccatgccctccaaatgctagagacctgggccctgaaccctg tagacagatgccctcagaattggggcatgggaggggggctgggggaccccatgattcagc cacggactccaatgcccagctcctctccccaaaacaatcccgacaatcccttatccctac cccaaccctttgcggctctgtacacatttttaaacctggcaaaagatgaagagaatattg taaatataaaagtttaactgtt
And the correct matching position is 38-14.
Could you provide the command you used to run the local copy of BLAST. It may be that the parameter settings you used are different to the ones used by the web tool.
I used