Dear all,
I am blasting (tblastn) a protein onto WGS on NCBI to search directly into the genomes of some taxa.
The protein is not present in every genome and I would like to be able to say "Protein X is present in n organisms out of the N in this lineage." (so, to be able to count N, the total number of sequenced genomes per taxon).
I have found two ways that give quite different results.
tblastn the protein on, say "arthropoda", and retrieve the number appearing in the corresponding field in the output page:
"wgs (676 databases)"
use this page and retrieve the number. Here, for "Arthropoda", it is
552
Would you know any other command line or online tool to get N? The ideal way would be to use a taxon number as input (6656 for Arthropoda).
Thanks for your help!
@cpad0112 thank you for your help. What is this line doing exactly? I cannot get the number of available genomes for Arthropoda for instance.
It is doing something similar to what I did below using a different program and eliminating a couple of lines at the beginning of that file.
Did you see my note below?