I am using a command like
NcbiblastnCommandline(query="fasta.fasta", db=nt, outfmt=5, out="out.xml", show_gis=True)
And that is returning results.
However, I want to limit the results to only returning the gi number/taxID (not the alignment or the bulk of the data) and to limit the hitlist size (as you would in NCBIWWW) to some arbitrary number. How can these be done?
Ultimately, I am trying to find related species to a given target that aren't the target and download their sequences. Since BLAST can't provide the complete genomes, I want to take their identifiers, so I don't need most of the BLAST output.
I'm assuming you're running blast in Python, because you have some upstream/downstream code that do other processes. However, it would be much easier to run the blast on the CLI, or use
subprocess()
in Python to call your blast cmd. Output format 6, tab-delimited, is the easiest IMO to parse. You canset max_target_seqs
to a value for each query sequence to return that number of db hits.