I have two sequences like those at the end. I want to find all local alignment larger than 10 and maybe shorter than some number (maybe I can filter that out later).
I was trying to do it with a smith-waterman, but I cannot find an implementation that retrieves me the from / to value to indentify where the results come from.
I want to allow mismatch and gaps, any ideas on what tool can I use?
It has to be fast since I will be splitting a full chromosome / genome and re doing this task many times. I'd also like to avoid reverse complement pairing.
>1
ATGGCTAGGAAACATAACCATCTTTGATCAACGAGCTAGTCAAGTAGAGGCATACTAGTG
ACACTCTGTTTGTCTATGTATTCACACATGTATCATGTTTCCGGTTAATACAATTCTAGC
ATGAATAATAAACATTTATCATGATATAAGGAAATAAATAATAACTTTATTGTTGCCTCT
AGGGCATATTTCCTTCAGAGTGCTAGTTGAAGGCACGTGATGTGCATCTTAGGTTAAACT
GTAAATCGAATCTTACTCTCGAGTAACTCCAAAAGCGGCCACCGAAAATCATTATTGGAT
TTGTTTTTCATTACCAGCACGTCGAGGTGAAAGATGTGCAGGTTCAGTGGGTCGACTTGC
GTTTTCACTGACATGTGGGACCGAATGTAAGCAAACAGACGATGGGCACACTCCGCGCGT
CTGCTGGCTGGCTGAGAAGAGAGAGGAGGGTTGAGCACGGACTGTAGGAAATTTCGTGAC
AATGTGAGTAGTACTAGTGGTGTACTGTACTCCTCGAGAGATAATGTGTTCGCTCTACGT
AACCAGCCCCTCGATATAGTGGGGTGGCATGGGCCATAACTAGCATTCACGTCTAACAGT
ACGGCACATTCCACCAATTTTCTTGGAATCCATGCTACAGCTATCTCTTCTCCCTCCTAT
TTTCTCTAACTCAGCCATCGCGCGGTGGGAGGGGTGGGGGTTTGCCGATGTCAGTTTGCT
CAACTTCGGTCCCACTTGTAAGTGAAAATGCAACACAGCACGCATCCCCCCTTGAGCGGC
GTGCCAGACCAACCGTGTGGAGGTGCGATGCGAAAATGGCAGGCTTTTCCTTTTGAACTT
GTAACATGTAAATTTTGGTTCTGCTCCTTGGCCCGACCGATAGATAGTAATGAGAAGTTG
ATGCCAAAAAATGAGAAGTTTGGTTCTGGCGCGGTTCATGCATGGTACGTACAAAAATTA
TGAAGTTGATGCAAAAGGCAAGATAATTACTAGTACAATATAGTCCCTCCATTCCTAAAT
ATTTGTATTTTTAGGAAAGATACATATGGATCCACCAACGACATCCTCACGCCCTAGTCC
GCCCGGTCACCAACCAACGGGTGAGGAGCAGCAGCGTTGGCGGCGAGCTCAACGTGCTTC
>1rc
CAATCGCAAGATCAGTTGACTAATGGAAGCTATACCCATGGAGGCGATGAGCGCGAGCAT
CAGCCGACTGTGCCAGATGGTCCATGACGCCGGCCTGCGGCCTGGCACCGAGGAACGTCT
CCAGGTCGTGCTTGAGGCCGCTAGGGCGAGAGGTCTCTTGGACGACAGCTTCGTCTCCTT
GTTCGACAAGGTCCTCGTTGGATTCCTCGACAAGTTCAACGTCGTGAAGAAGCTCACGGA
CGACCTTGACATACGCCTCCAGCCCACGCGCCCAGGCTCTACGATGCCCGCCACCCTCAA
CGACCTCTATGATGACAACCTCTTCGATGCACTGGTGGACCTGCGACTGCCCGTCGTCGT
GCCGGAGATTGTCCACCTCGAGGTCACGCTCGCCGCGCAGCGCCTGGCACAGTAAGACAC
CATCGACATAATCACCCACGTCTACGCGCAAATCGTCCACAAGGACTACTACATGCCAGA
GGAGGAGGATAGGACGCTGGCCTTCTTGGAACGCAGGGCAACCTTGGACGGCATTGTTTA
GAAGCACGTTGAGCTCGCCGCCAACGCTGCTGCTCCTCACCCGTTGGTTGGTGACCGGGC
GGACTAGGGCGTGAGGATGTCGTTGGTGGATCCATATGTATCTTTCCTAAAAATACAAAT
ATTTAGGAATGGAGGGACTATATTGTACTAGTAATTATCTTGCCTTTTGCATCAACTTCA
TAATTTTTGTACGTACCATGCATGAACCGCGCCAGAACCAAACTTCTCATTTTTTGGCAT
CAACTTCTCATTACTATCTATCGGTCGGGCCAAGGAGCAGAACCAAAATTTACATGTTAC
AAGTTCAAAAGGAAAAGCCTGCCATTTTCGCATCGCACCTCCACACGGTTGGTCTGGCAC
can't you run BLAST with reduced word_length?
https://www.ncbi.nlm.nih.gov/books/NBK279684/
word_size blastn integer 11 Length of initial exact match.