How To Create A Pssm From Fasta Homologues With Ncbi Blast+ 2.2.23
2
3
Entering edit mode
14.1 years ago

I have a FASTA sequence file with about 10 homologous proteins. What I would like to do is create a PSSM from them and use it to search a transcriptome database.

But how to create it? There is a makemat executable for exactly this task in the NCBI legacy BLAST package which does not seem to have an equivalent in BLAST+.

The new psiblast offers a variety of options (eg. -in_msa, out_pssm) with which it should be possible to create an initial profile, but these two options are dependent on a database or subject sequences (which does not make much sense to me).

What am I missing? Any help is appreciated.

ncbi blast pssm • 15k views
ADD COMMENT
0
Entering edit mode

How can I get alignment.fasta using command line??

ADD REPLY
0
Entering edit mode

maybe you should consider to open a separate question - you're question will be lost as a reply to this question. However, your problem (generate a MSA, a multi-sequence alignment) is fairly trivial in bioinformatics and there must be many threads around that topic. Programs that can do this are muscle, t-coffee, clustal-w, etc. There might be more modern versions, but the good old stuff will do as well.

ADD REPLY
6
Entering edit mode
14.1 years ago

Solved.

The correct usage for 2.2.23+ is (-subject produces an error which is fixed in 2.2.24+):

psiblast -db blastdb -in_msa alignment.fasta -out_ascii_pssm pssm.txt

And for 2.2.24+ supplying a subject FASTA file works

psiblast -subject oneseq.fasta -in_msa alignment.fasta -out_ascii_pssm pssm.txt

For both approaches, it does not matter if there is one sequence in db/subject or any subset of the alignment sequences. PSSM output is exactly the same. Note that the query needs to be supplied with in_msa in order to generate a PSSM in one step.

ADD COMMENT
0
Entering edit mode

The PSSM generated using psiblast bases the whole matrix on first sequence in the alignment. Any idea, why?

ADD REPLY
0
Entering edit mode

I think you should use the -ignore_msa_master option https://www.ncbi.nlm.nih.gov/books/NBK279694/

ADD REPLY
0
Entering edit mode

I get an error: BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read(). Do you know why?

ADD REPLY
2
Entering edit mode
14.1 years ago
Rm 8.3k

work around: Create a database of those 10 sequences itself or along with other homologous sequences and then generate the profile (pssm) while searching against the database created.

ADD COMMENT
0
Entering edit mode

since database and subject file are equivalent (with one being a multi-FASTA and the other a BLAST DB obviously) creating a db does not make much sense; even if I create a database first, psiblast does not create the PSSM file for whatever reason

ADD REPLY
0
Entering edit mode

You were indeed pointing me in the right direction. Thank you for that! I will accept my own answer though because it is more complete (sorry for that). But I modded you up ;-)

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6