hello
I am trying to calculate the PSSM but I keep getting error , where is the problem in this code ?
import os
import re
def command_pssm(content, output_file,pssm_file):
os.system('psiblast \
-in_msa 1ak4.fasta \
-db allseq.fasta \
-num_threads 10 \
-num_iterations 3 \
-evalue 0.001 \
-out output_file\
-out_ascii_pssm PSSM.txt' )
input_file = "1ak4.fasta"
output_file = "myoutput.txt"
out_ascii_pssm ="PSSM.txt"
command_pssm(input_file, output_file, out_ascii_pssm)
as an example I get the following error
BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read()
even when I run this command, I get the same error
psiblast -subject 1ak4.fasta -in_msa allseq.fasta -out_ascii_pssm pssm.txt
BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read()
My 1ak4.fasta looks like this
MVNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSCFHRIIPGFMCQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFICTAKTEWLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADCGQLE
my allseq.fasta look like this (few lines)
>1ak4_A, dset72, 165 residues
MVNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSCFHRIIPGFMCQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFICTAKTEWLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADCGQLE
>1ak4_D, dset72, 145 residues
PIVQNLQGQMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAAEWDRLHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTHNPPIPVGEIYKRWIILGLNKIVRMY
I also tried
import os
import sys
os.system('psiblast -query 1ak4.fasta -db allseq.fasta -num_iterations=6 -evalue=0.005 -out test_result -out_pssm=PSSMtest_results')
print ("Done !!")
but I got error again
admin$ python code.py
BLAST Database error: No alias or index file found for protein database [allseq.fasta] in search path [/Users/admin/Desktop/myexample::]
Done !!
@Mensur Dlakic I just amended the question, let me know if you need any other input info.
As the
-in_msa
switch implies, it expects a multiple sequence alignment as input, while you are giving it a single sequence. That won't work. If you adjust your single sequence so it is in fasta format (add the first line beginning with>
) and use the-query
switch instead of-in_msa
, it should work. That is assuming that yourallseq.fasta
is already formatted usingmakeblastdb
.Mensur Dlakic I am one step close, when I run the makeblastdb , it gives me many output, which one should I use ? I run it like this
makeblastdb -in allseq.fasta -input_type fasta -dbtype prot -out all_seq
All the files created by
makeblastdb
are required. In your case they should look something likeallseq.fasta.p??
where a question mark can be any letter. Simply give the name of your original file (allseq.fasta
like you did before) in (PSI)BLAST search commands and it will automatically look up all the files.PS Just saw that you used the
-out
switch. In that case the file name to use would beall_seq
.Mensur Dlakic I put -in_msa alseq.fasta and -db allseq it gives me error like
BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read()
Please read carefully what I wrote already. You CAN'T use
-in_msa
and give a single sequence as input. It has to be the-query
switch instead.Mensur Dlakic I just thought the similarity of the names could cause the issue, so, I downloaded human database and then made it as my db using the following . I use the above code and input became allseq.fasta , db became Human but still gives me error !!! I really dont know what is the matter with this
makeblastdb -in Human.fasta -input_type fasta -dbtype prot -out Human