I'm using SequenceServer to BLAST RADseq reads against a custom BLAST database constructed from a genome which I've just assembled.
However, I'm getting a lot of hits which I do not want. I only want hits with a perfect match to the cut site of the restriction endonuclease used to make the RAD libraries (EcoRI - G'AATT,C), at the start of the hit.
Is there a way to coerce BLASTn to only return hits with a perfect match to this sequence at the beginning of the hit, but which are free to vary "normally" downstream of that sequence?
All my RADseq reads start GAATTC. (Just in case anyone is wondering - I've added a G to the beginning of all my RADseq reads to ensure that I only get hits which match the whole cut site, but I'm still getting hits which begin at e.g. position 7 of the query sequence).
Have you thought about using
bbduk
from BBMap suite to extract reads that have that string instead of using BLAST?Hi genomax
Thanks for replying so quickly. No, I hadn't considered doing that, but thanks for the suggestion! I wouldn't do it for the reads used to make the assembly, as I am looking for RAD loci which have been filtered according a set of specific criteria, and I want to know where those loci sit in the assembly. I could do it with the assembly scaffolds, but considering that it's only a 6 character string, and some of the scaffolds are pretty big, do you not think that it would likely return a lot of undesirable sequences?