Hello, I'm running BLAST on a local server. Even though I have a small and limited amount of taxa ID I want to check (70 species), I run BLAST on the full nr database, and only afterwards filter the results according the taxa ID, for the sake of the integrity of the E-values I get (if you think I could do this otherwise - I would be delighted to know!). In order to get as many BLAST hits as possible I use " -max_target_seqs 1000000000".
For some species I get a lot of hits, while others get no hits at all, even though they are not very distant from each other. The absence/presence of hits is consistent with 13 different genes I checked. Can I assume that the absence of hits has true biological meanings, or could this be a matter of genome quality and other parameters? If the correct answer is the second one, how can I select the right genomes to check if I want to get a reliable comparative picture between different taxa?
I would appreciate any answer, thanks! Efrat
It is not clear (to me) as to what you are trying to do here. You have sequences from 70 species (is that your input)? And your aim is to find the taxid's for these species?
Thank you for replying! I'm blasting a certain gene against 70 different species. Since local blast does not enable to define the species your want to run your blast against in advance (unless you create a separate db, which will change the E-values), I first run the blast against the whole database. Only afterwards I filter the results according to the desired taxids. My question is: How do I choose the right taxids? I'm looking for species whose genome quality is good enough to deduce, in case I do not succeed in performing blast, that there is no homolog (and not the absence might be due to low quality of the sequence and other technical issues). Hope I'm clearer this time! Thanks
You have 70 genomes at hand then. Since you collected those you must have some information about the quality/status from the sources. Since you don't have control over the data/quality I am not sure if there can be a "right" set of taxids.
As you point out, from the data at hand, you can conclude that you are not able to find a homolog (if you don't find a hit, but that would not mean that one does not exist, especially if the genomes are not "finished"). Only way to conclusively prove/eliminate that possibility would require additional experimentation (PCR, sequencing etc).
I see, thank you very much for the help