How to build a local Pfam database?
2
1
Entering edit mode
8.2 years ago

Hi everyone

I would like to build a local database (on my server) to search for conserved domains with Pfam database. I think it's possible to make RPS-blast through NCBI BLAST +. Could someone help me which file should I download PFAM and build this database on my server?

blast • 13k views
ADD COMMENT
0
Entering edit mode

You can use Interproscan program for searching conserved domain.

ADD REPLY
0
Entering edit mode

Hi

Just wanted to know if you found the answer to your question. I am also trying to download pfam to use it locally. Can't find any useful links.

Thanks

ADD REPLY
7
Entering edit mode
6.3 years ago
h.mon 35k

The Pfam database is a large collection of protein domain families. Each family is represented by multiple sequence alignments and a hidden Markov model (HMMs).

Here are two solution to setting up protein domain searches:

1) You can set up a local database for use with HMMER, which is the same program Pfam site uses to search for protein domains in submitted queries. You have to download the HMM models, then prepare the database with hmmpress, and use hmmscan or hmmsearch to search the database.

wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/Pfam-A.hmm.gz
gunzip Pfam-A.hmm.gz
hmmpress Pfam-A.hmm
hmmsearch --tblout out.txt -E 1e-5 --cpu 2 Pfam-A.hmm input_proteins.faa

2) RPS-BLAST uses databases compiled from position-specif scoring matrices. The Conserved Domain Database encompasses data from several other databases, Pfam being just one of them - read more here. NCBI provides a pre-formated RPS-BLAST database derived from Pfam:

ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Pfam_LE.tar.gz

You can use this (and the others available) to set-up a local RPS-BLAST databse.

ADD COMMENT
0
Entering edit mode

Thanks a lot, this struggled me for a while.

ADD REPLY
2
Entering edit mode
6.3 years ago
agata88 ▴ 870

Not sure if this question is actual but I will answer anyway.

Here is the link to fasta aa Pfam database:

ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.fasta.gz

You can create local database by running this command (first download blast-ncbi+ (sudo apt-get install blast-ncbi+):

makeblastdb -in Pfam-A.fasta -dbtype prot

Then search your aa sequences against Pfam database. Eg. of blastp below:

blastp -query file.fasta -task blastp-fast -db Pfam-A.fasta -out blast_results.txt -evalue 0.001 -max_target_seqs 1 -num_threads 8 -outfmt '6 qaccver qlen pident length evalue stitle  ' > blast_log.txt

where:

-max_target_seqs 1 - show only one best hit

One line of output:

orf1    93  100.000 93  1.97e-62    A0A087VDV9_BALRE/1-211 A0A087VDV9.1 PF00244.19;14-3-3;

Best, Agata

ADD COMMENT
0
Entering edit mode

Your answer is not accurate, as you are only creating a regular BLAST database using Pfam fasta sequences. OP wants to search for protein domains, he has to use some method that takes into account the domains, not only the sequences.

ADD REPLY

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6