local blastx output
0
1
Entering edit mode
6.4 years ago
agata88 ▴ 870

Hi all!

I was running local blast command for transcripts fasta sequences as below:

blastx -query input/file.fasta -task blastx-fast -db blast-db-nr/nr -out output/file_blast_results.txt -evalue 0.001 - max_target_seqs 1 -num_threads 30 -outfmt '6 qaccver saccver pident length evalue qstart qend sstart send staxid ssciname scomname sblastname' > blast.log 2>&1&

I downloded nr database from ftp ncbi site and indexed it by makeblastdb script. As a results I have a list of accession numbers, percent of identity, length etc, the last descriptions are all NA.

My questions are:

  1. How can I get organism name, description of protein ect. from list of accession numbers like WP_083411507.1, CBW15324.1.

  2. I am observing that I have blast results for pig protein while my experiment include only bacteria - how can I select only prokaryotic nr part of database?

  3. In next step I would like to assign GO numbers to blast results, any idea how to do that?

Many thanks for any suggestions,

Best, Agata

PS. Input include ~3000 nucleotide sequences.

blastx • 2.5k views
ADD COMMENT
1
Entering edit mode

I downloded nr database from ftp ncbi site and indexed it by makeblastdb script.

Small nitpick. You either downloaded the premade indexes (in which case you don't need to makeblastdb) or you downloaded the fasta sequences for nr (in that case you would need to makeblastdb).

sscinames (I think you missed a s above) and stitle should give you the two pieces of information you are requesting. You can probably use NCBI unix utils to get them after the fact if you don't want to re-do the search.

efetch -db protein -id "WP_083411507.1" -format docsum | xtract -pattern DocumentSummary -element Caption -element Title
WP_083411507    hypothetical protein [Arthrobacter sp. UCD-GKA]
CBW15324        unnamed protein product [Haemophilus parainfluenzae T3T1]

Blast2GO would be one possibility.

ADD REPLY
0
Entering edit mode

Thanks, I downloaded nr database from here: ftp://ftp.ncbi.nlm.nih.gov/blast/db//FASTA/nr.gz

Yes, that is correct i'v missed "s" at the end of sciname scomname and sblastname.

I am trying to avoid blast2go since is not open source, that is why I've decided to run blast locally on my own.

ADD REPLY
1
Entering edit mode

How can I get organism name

Obtain NCBI Taxonomy ID from local blast output

Do not forget you also need TaxDB ( ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz )

how can I select only prokaryotic nr part of database?

You may use -gilist (but GI numbers have been deprecated) or -seqidlist.

 -gilist <String>
   Restrict search of database to list of GI's
    * Incompatible with:  negative_gilist, seqidlist, negative_seqidlist,
   remote, subject, subject_loc
 -seqidlist <String>
   Restrict search of database to list of SeqId's
    * Incompatible with:  gilist, negative_gilist, negative_seqidlist, remote,
   subject, subject_loc

There are several ways of getting taxon-specific accessions, for example:

Extract all protein sequences of specific taxons from the NCBI nr database

In next step I would like to assign GO numbers to blast results

Use Blast2GO, dammit, Trinotate...

ADD REPLY

Login before adding your answer.

Traffic: 2533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6