I am designing PCR primers to amplify a region of the 18S rRNA gene of Penicillium expansum. As the template for primer design, I use the consensus sequence of a multiple sequence alignment of 18S sequences obtained from the SILVA database.
When I test the primers with primer-blast, it finds the expected targets from the nucleotide collection. Additionally, I primer-blasted the primers against P. expansum genomes from NCBI Datasets. However, the expected product was found in only one of the nine available genome assemblies (GCA_004302965.1).
Why does primer-blast give (false-)negative results? Do you know a reason why 18S sequence(s) might be missing in genome assemblies?
Looks like that assembly (GCA_004302965.1) is the newest one. Could it just be that the others were too fragmented and missing the gene or relevant portions thereof?
That might be an explanation. Indeed, GCA_004302965.1 has the highest N50 value of all available assemblies and was sequenced with PacBio. However, to my knowledge, 18S has multiple copies in many eukaryotes, so I would still be surprised if all copies were fragmented or missing in the other assemblies.
My goal is to avoid false-negatives with primer-blast due to missing targets. Are you aware of a solution to check if a gene is present in a given assembly?
I find this rather strange too. That reference assembly (ASM76974v1) doesn't seem to have an 18S sequence either. I searched against that with this sequence, and I got nothing:
Maybe you could try contacting the submitters to see what's going on?