Question

How To Create A Pssm From Fasta Homologues With Ncbi Blast+ 2.2.23

3

Entering edit mode

14.5 years ago

Michael Schubert ★ 7.1k

I have a FASTA sequence file with about 10 homologous proteins. What I would like to do is create a PSSM from them and use it to search a transcriptome database.

But how to create it? There is a makemat executable for exactly this task in the NCBI legacy BLAST package which does not seem to have an equivalent in BLAST+.

The new psiblast offers a variety of options (eg. -in_msa, out_pssm) with which it should be possible to create an initial profile, but these two options are dependent on a database or subject sequences (which does not make much sense to me).

What am I missing? Any help is appreciated.

ncbi blast pssm • 15k views

ADD COMMENT • link updated 2.6 years ago by LittLe • 0 • written 14.5 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

How can I get alignment.fasta using command line??

ADD REPLY • link 6.6 years ago by mjavad2012 ▴ 10

0

Entering edit mode

maybe you should consider to open a separate question - you're question will be lost as a reply to this question. However, your problem (generate a MSA, a multi-sequence alignment) is fairly trivial in bioinformatics and there must be many threads around that topic. Programs that can do this are muscle, t-coffee, clustal-w, etc. There might be more modern versions, but the good old stuff will do as well.

ADD REPLY • link 6.6 years ago by Carambakaracho ★ 3.3k

Ram · Answer 1 · 2010-10-25

6

Entering edit mode

14.5 years ago

Michael Schubert ★ 7.1k

Solved.

The correct usage for 2.2.23+ is (-subject produces an error which is fixed in 2.2.24+):

psiblast -db blastdb -in_msa alignment.fasta -out_ascii_pssm pssm.txt

And for 2.2.24+ supplying a subject FASTA file works

psiblast -subject oneseq.fasta -in_msa alignment.fasta -out_ascii_pssm pssm.txt

For both approaches, it does not matter if there is one sequence in db/subject or any subset of the alignment sequences. PSSM output is exactly the same. Note that the query needs to be supplied with in_msa in order to generate a PSSM in one step.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 14.5 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

The PSSM generated using psiblast bases the whole matrix on first sequence in the alignment. Any idea, why?

ADD REPLY • link 11.4 years ago by microbeatic ▴ 80

0

Entering edit mode

I think you should use the -ignore_msa_master option https://www.ncbi.nlm.nih.gov/books/NBK279694/

ADD REPLY • link 6.0 years ago by lagartija ▴ 160

0

Entering edit mode

I get an error: BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read(). Do you know why?

ADD REPLY • link 2.6 years ago by LittLe • 0

score 2 · Answer 2 · 2010-10-21

2

Entering edit mode

14.5 years ago

Rm 8.3k

work around: Create a database of those 10 sequences itself or along with other homologous sequences and then generate the profile (pssm) while searching against the database created.

ADD COMMENT • link 14.5 years ago by Rm 8.3k

0

Entering edit mode

since database and subject file are equivalent (with one being a multi-FASTA and the other a BLAST DB obviously) creating a db does not make much sense; even if I create a database first, psiblast does not create the PSSM file for whatever reason

ADD REPLY • link 14.5 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

You were indeed pointing me in the right direction. Thank you for that! I will accept my own answer though because it is more complete (sorry for that). But I modded you up ;-)

ADD REPLY • link 14.5 years ago by Michael Schubert ★ 7.1k