Hello everyone,
I'm in the process of designing complementary capture probes for two strains (type1 and type2) of the same virus, and as such most of their sequences are similar, but each strain has unique regions. I've tiled both genomes for each probe, and I'd like to subtract the two probe sets, leaving me only the sequences that are unique to type2 virus. The idea is that by designing probes for all of type1, and the unique regions of type2, we can save money by not making "redundant" probes for sequences that are conserved both strains.
Is anyone aware of a software that can do this type of subtraction? The probes are 120bp long, and I want to identify which probes differ by >= 5% (>= 6bp). Kind of like a reverse-BLAST Right now I'm BLASTing each probe and remaking them manually, but there are dozens that need to be remade, as well as entire insertions that the current process doesn't account for. Any help would be appreciated. Please feel free to ask for clarifying questions as well.
Examples of the probes:
Sufficiently matching pair:
>strain1:600-720bp
TTGTGGCGGCATCATGTTTTTGGCATGTGTACTTGTCCTCATCGTCGACGCTGTTTTGCAGCTGAGTCCCCTCCTTGGAGCTGTAACTGTGGTTTCCATGACGCTGCTGCTACTGGCTTT
>strain2:600-720bp
TTGTGGCGGCATCATGTTTTTGGCATGTGTACTTGTCCTTATCGTCGACGCTGTTTTGCAGCTGAGTCCCCTCCTTGGAGCTGTAACTGTGGTTTCCATGACGCTGCTGCTACTGGCTTT
Mismatching pair that I would need to remake:
>strain1:5760-5880bp
CCCTCCTCAGAAAACTCTGCATGGAGAAGCTGGACGTGAACCTCCCCCCCAGACCTGTGTGCTGTATTTACAAACACTACAATAAACCCAATGTGCAAATGTGGTTTGTATGGCTACTTT
>strain2:5760-5880bp
CCCTCCTCAGAAAACTCTGCATGGAGAAGCTGGACGTGAACCTTCCCCCCCCCCCCGACCTGTGTGCTGTATTTACAAACACTACAATAAACCCAATGTGCAAATGTGGTTTGTATGGCT
cd-hit
is excellent. There's also a new tool called VSEARCH which aims to be an alternative to the non-freeUSEARCH
I was having a hard time installing CD-HIT, so I checked out VSEARCH and managed to install it successfully. Which VSEARCH function would best address my question? Most notably, is it possible to filter to allow some mismatches, since <5% mismatch is acceptable? I'm reading through the documentation now but some advice is always appreciated :)