Entering edit mode
4.6 years ago
tom5
•
0
Hi, I am trying to run BLAST+ alignment remotely, but the server keeps logging me out. I think a better strategy would be to run BLAST+ locally with a database. I am performing protein BLAST alignment on chicken and mouse genes, so I would like to set up a local version of a database (such as the nr database) with just those organisms. Please let me know if there's a way to do this.
Thank you for your help. I looked at the tutorial and manual you linked and I don't think they fully answered my question. I am trying to run protein BLAST alignment locally through BLAST+ and want to download the nr database. However, due to the large size of the database, I'd like to only download the portion of the dataset that corresponds to organisms I am working with: Mouse and chicken.
I looked at the NCBI guide and it gives a command to download databases: update_blastdb.pl --decompress nr [*]
However, I am not sure how to specify an organism specific download. The tutorial you recommended recommends using makeblastdb to generate a database from FASTA files. How do I get the correct files to do so? Please let me know if you have a recommendation. Have a good evening.
No you can't do that. You are best off downloading mouse and chicken genome fasta files from
NCBI datasets
and then creating the database yourself.Hi, thanks again, this is a valuable resource. Unfortunately I am uncertain how to go from here to my BLAST database. What I've done so far is:
I'm not sure where my error is but I suspect I did not generate the dataset correctly. Please let me know if you can help.
Another issue I had is downloading the gallus gallus dataset from the NCBI datasets web browser: chicken. When I tried to download, it redirected me to an empty page and the download did not start.
Files with
.fna
extension are genomic DNA files. If you want to search with protein queries against it, you will need to usetblastn
instead ofblastp
. I don't suggest doing that with raw eukaryotic genomes because of splicing. Instead, you can find files with.faa
extension which will contain proteins. Looks like you already haveprotein.faa
in your collection, so use that as input formakeblastdb
and for a subsequentblastp
search.As to why nothing happens in your search, DNA sequence can pose as fake protein sequence, since all DNA bases are legitimate protein residues. Given that DNA databases are at least 3x larger than corresponding proteins databases (and in case of eukaryotes more like 50-100x larger), the search you initiated is simply taking a long time. It would likely finish if you gave it some time, though the results would be meaningless as you are comparing a protein sequence to a DNA database acting as a fake protein database.