Entering edit mode
23 months ago
GR
▴
400
Hi All,
What is the best way to come up with a 30 bp sequence that does not have any hit within the genome? Even better if it does not show hit after allowing 1 or 2 bp mismatches.
Thanks!
I'd make millions of random 30-mers and align it to the given genome with bowtie, with maximum lenient settings. Unmapped sequences will meet your criteria. Pretty much a non-sophisticated sledgehammer method, but easy to do with existing tools and some bash-fu, without reinventing anything.
Finding 16 mer not present in GRCh38
How to generate a short sequence that does not align to the hg38?
How to get the sequence differences between multiple bacterial genomes