Hi all,
I'm trying to retrieve a set of sequences that can be used for primer design. These sequences have some requirements, e.g. present in 4 different species, not present in a number of species on a blacklist, at least 200bp in length and preferably 50 or more copies of the sequence in all species.
So far I've succeeded and have a total of about 10 sequences that meet these requirements, however I'm unsure if I'm correct about the number of copies for each of the sequences.
For calculating the number of copies, I'm using plain BLAST with 95% identity and count the number of hits for all sequences (where alignment length >= 200bp and non-overlapping start/end positions in the target genome).
Is this a correct method of assessing the number of copies throughout a genome? What other methods are available that I can use for validation?
Thanks for any tips or solutions!
Edit: Thanks for the responses so far, however my goal here is not to design the primers myself, that will be taken care of by someone else. The primer designer gave those requirements as that is what their software requires. The eventual primers will be LAMP primers (process described at Eiken GENOME SITE. So for now I'm only interested in counting the number of copies for each of my candidate sequences.
I can not provide a solution, but maybe these papers are a source of inspiration: