right now I'm running
update_blastdb.pl --timeout 300 refseq_genomic.
But this takes up hundreds of GB on my computer. I'm wondering if there is a way to get just the genomes I want For example, if I just want the genomes for Gallus gallus, Mus musculus, and Homo sapiens how can I do something similar to get just those genomes.
Explain things if you can I'm pretty new at doing this and not very good at trying to link ftp databases to my blast searches.
Thank-you very much I've tried doing this method, but cannot execute it right and I do not know why
I then follow up this command with the following and get errors which I do not know how to tackle
option 1
option 2
Your command is wrong since it does not address a specific file.
I suggest that you use the UCSC links I provided to make your life simpler. The command in that case should be
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/galGal5/bigZips/galGal5.fa.gz
After you download the files you will need to
gunzip/tar or tar -avf
them to uncompress them. That will be followed bycat
ing the three genome files togethercat hg38.chromFa.fa mouse.fa chicken.fa > giant_genome.fa
Finally run
mkblastdb -i giant_genome.fa etc
to make the database.Use real file names when
cat
'ing and appropriate options formkblastdb
when you run the final command.Note: If you want to make separate databases for the three genomes then don't do the
cat
step.Thank-you, a few questions though
Main problem I'm still getting an error with my makeblastdb command gunzip galGal5.fa.gz
I assume you meant to type -in because I have no -I option. When I use -in as you did I get this error
When I add in some of these mandatory values I still get an error
Extra questions.
If I want to get the genomes from the ncbi link I posted, how can I get the specific link
Is that suppose to be tar -avf ? My tar has no -a option
You will need to go into individual chromosome directories and get the *fa.gz file for each (e.g. Chr1 for Mouse).
Use the UCSC method above. It will save you a bunch of time. Sequence is identical no matter where you get it from.
If you need a primer for unix then I suggest that you spend some time at this site.
Trying to extract genomes you need from blast index for nt or refseq_genomic would be a much more tedious undertaking. You can't do it on the fly so to speak. You will need to download the entire index locally and then do the extraction. The method I described here is more straightforward.
Thank-you so much for your help. I edited the comment because it still wasn't working, but I think I just need to change the dbtype to fasta