Entering edit mode
7.7 years ago
Penny Liu
▴
30
I want to narrow down the blastn search against nt database
using gilist
.
I already got all taxids of bacteria (taxid 2
) and extacted GIs with csvtk
(Please refer to this).
The next step was to proceed bacterial species identification.
When I run
blastn -query query.fasta -db /path/to/nt -gilist bacteria.taxid.gi.txt -evalue 1e-6 -outfmt 6 -out sequences.txt
An error occured:
BLAST Database error: Specified file is not a valid GI/TI list.
Please refer to the attached file.
bacteria.taxid.gi.txt (Number of taxids: 309,264,110)
What am I doing wrong? Thanks for the help in advance.
Hello! I see a couple of possible problems:
1) your gi.list file is too large, 3 Gb. BLAST has some limits as far as I remember.
2) BLAST cannot find the file since you put it here: http://bioinfo.cs.ccu.edu.tw/CCU_bioinf/bacteria.taxid.gi.txt If you run blast ih the same directory, it's OK
3) Your list of gis have a header gi, that is not a gi-number, right?
You're right. The word
gi
is redundancy. I removed the redundant data from text file, then the problem is solved. :)Hi Yi-Ting, refer to here: C: Extract all bacteria sequences from the nr database
Hi Yi-Ting, can I ask how did you get this bacteria gi list from? I am trying to download it directly from the NCBI (by 'save to file' -> GI List etc...) but it failes due to timeout error.. Do you have an easy way to do that? tnx in advance
My extract method same as you. This process can take several hours to complete. I added multiple keywords (term=whole+genome+bacteria) to narrow down the search scope.