I was wondering how to obtain just the scientific name from a blast search. I currently have 10,000 sequences that I would like to blast with nt database. I am only interested in the taxid, so I have would like to end up with a csv file that has the queryid followed by the taxid. I set up my BLASTDB and downloaded the taxdb and set it to the same path (as suggested in the thread Scientific Names In Blast Output And Databases.
When I use the following code I only get my queryid
blastn -db nt -max_target_seqs 1 -outfmt '6 qseqid staxids' -query blast.fasta -task blastn
Did I forget something? How do I get a taxid output?
This will sound silly, but are you just finding sequences from the species you're queries come from? I don't see the problem with the command you're running. Maybe the taxonomy IDs were not included when the database was built or the sequences you're finding don't have taxids?
Do you have a
.ncbirc
file in your home dir with:Yes I just want to know the species identity (or close identity) of each of my sequences. There must be something wrong with the PATH to my taxdb because other settings such as sacc, sseqid, evalue, bitsocre all work correctly.
What do you mean by 'taxdb'? I am pretty sure the taxonomy IDs are part of the blast database you're searching. You either have to provide the taxonomy IDs in the header or provide them in a mapping file. I don't think blast links to another database or something.
Did you download the nt blast database or did you download the nr fasta data and build the database?