Hi all,
I have a sequence in fasta format that I want to blast against a tumour genome fasta file. I have attempted to create a database of the tumour genome using this:
I am in /gpfs/igmmfs01/eddie/Glioblastoma-WGS/blast
SAMPLE_ID=DO10900T makeblastdb -in ${SAMPLE_ID}.fa -title "TCGA1"
-dbtype nucl
This is the log:
Building a new DB, current time: 03/20/2020 12:33:54 New DB name: /gpfs/igmmfs01/eddie/Glioblastoma-WGS/blast/DO10900T.fa New DB title: TCGA1 Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B
These are some of the output files:
DO10900T.fa.00.nhr DO10900T.fa.12.nsq DO10900T.fa.25.nin DO10900T.fa.38.nhr DO10900T.fa.50.nsq DO10900T.fa.63.nin DO10900T.fa.00.nin DO10900T.fa.13.nhr DO10900T.fa.25.nsq DO10900T.fa.38.nin DO10900T.fa.51.nhr DO10900T.fa.63.nsq DO10900T.fa.00.nsq DO10900T.fa.13.nin DO10900T.fa.26.nhr DO10900T.fa.38.nsq DO10900T.fa.51.nin DO10900T.fa.64.nhr DO10900T.fa.01.nhr DO10900T.fa.13.nsq DO10900T.fa.26.nin DO10900T.fa.39.nhr DO10900T.fa.51.nsq DO10900T.fa.64.nin DO10900T.fa.01.nin DO10900T.fa.14.nhr DO10900T.fa.26.nsq DO10900T.fa.39.nin DO10900T.fa.52.nhr DO10900T.fa.64.nsq DO10900T.fa.01.nsq DO10900T.fa.14.nin DO10900T.fa.27.nhr DO10900T.fa.39.nsq DO10900T.fa.52.nin DO10900T.fa.65.nhr DO10900T.fa.02.nhr
And then I want to blast the sequence (CSE) file against the databse:
PATIENT_ID=`head -n $SGE_TASK_ID $IDS | tail -n 1`
DATABASE= /gpfs/igmmfs01/eddie/Glioblastoma-WGS/blast/DO10900T.fa
cd $BLAST
blastn -db $DATABASE -query ${CSE} -out ${CSE}_${PATIENT_ID}${TYPE}.out
But I received the classic error of no index and alias found etc..
BLAST Database error: No alias or index file found for nucleotide database [/gpfs/igmmfs01/eddie/Glioblastoma-WGS/blast/DO10900T.fa] in search path [/gpfs/igmmfs01/eddie/Glioblastoma-WGS/blast::]
Any clues?
What is the file size of your input .fa file that you used in your
makeblastdb
cmd?Can you also check that there is a file called
<your blastdb name>.nal
present.It is 177G . And there is no .nal file. just lots of .nhr .nin .nsq and no .nog .nsd .nsi (as for hg38 when i tested this).
There may be a space between
DATABASE= /gpfs/igmmfs01/eddie/Glioblastoma-WGS/blast/DO10900T.fa
the equal to sign and the rest of the directory path. At least there is one in the command you pasted above. Can you check and remove that?Sorry that was my bad- there wasn't any space in the actual run.
Thanks guys! It turned out i didn't have the .nal files as memory had run out. So I re-ran it with enough memory and it worked fine.
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.