Hi all!
I was running local blast command for transcripts fasta sequences as below:
blastx -query input/file.fasta -task blastx-fast -db blast-db-nr/nr -out output/file_blast_results.txt -evalue 0.001 - max_target_seqs 1 -num_threads 30 -outfmt '6 qaccver saccver pident length evalue qstart qend sstart send staxid ssciname scomname sblastname' > blast.log 2>&1&
I downloded nr database from ftp ncbi site and indexed it by makeblastdb script. As a results I have a list of accession numbers, percent of identity, length etc, the last descriptions are all NA.
My questions are:
How can I get organism name, description of protein ect. from list of accession numbers like WP_083411507.1, CBW15324.1.
I am observing that I have blast results for pig protein while my experiment include only bacteria - how can I select only prokaryotic nr part of database?
In next step I would like to assign GO numbers to blast results, any idea how to do that?
Many thanks for any suggestions,
Best, Agata
PS. Input include ~3000 nucleotide sequences.
Small nitpick. You either downloaded the premade indexes (in which case you don't need to
makeblastdb
) or you downloaded the fasta sequences fornr
(in that case you would need tomakeblastdb
).sscinames
(I think you missed as
above) andstitle
should give you the two pieces of information you are requesting. You can probably use NCBI unix utils to get them after the fact if you don't want to re-do the search.Blast2GO would be one possibility.
Thanks, I downloaded nr database from here: ftp://ftp.ncbi.nlm.nih.gov/blast/db//FASTA/nr.gz
Yes, that is correct i'v missed "s" at the end of sciname scomname and sblastname.
I am trying to avoid blast2go since is not open source, that is why I've decided to run blast locally on my own.
Obtain NCBI Taxonomy ID from local blast output
Do not forget you also need TaxDB ( ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz )
You may use
-gilist
(but GI numbers have been deprecated) or-seqidlist
.There are several ways of getting taxon-specific accessions, for example:
Extract all protein sequences of specific taxons from the NCBI nr database
Use Blast2GO, dammit, Trinotate...