Hi, i am looking for organisms in a clade (here: firmicutes), that have one gene (say A) but lack the other (B). Whille I can easily find organisms that have gene A via BLAST search, I am struggling with the second constrain. Manually checking each organism found in A in a subsequent BLAST search with B as a query obvioucvsly is not an option. I tried blasting gene B too and compairing the lists of resulting organisms, but since the lists are not complete and only show the first 50/100 hits, this approach was unsuccessful.
Any suggestions on this problem? Thanks in advance.
Also note that both genes A and B may have multiple paralogs and are poorly/incensistently annotated, so comparison needs to be on the sequence.
I think BLAST is optimized for small queries and large databases. I am currently blasting my two genes locally against the uniprot/trembl database using -m 9 to give me ĵust a result table and a high -v and -b option (both 10000). Hopefully this value is high enough to give me all significant hits. Then it would just be a matter of matching the results to organisms (provided in the fasta annotation). Fingers crossed...
I think you are right by blasting all the way around - my bad.
Just for the record: the strategy works fine although it requires a lot of time/disk space and there is still some manual work needed.
Thanks for letting me know! Yeah that is the curse we all have to bear with, sometimes I have to wait days of computation wasting tons of space and network traffic just to re-run the pipeline because some parameter was set wrong.