How to Subset a Custom BLAST Database for Specific Genes of Interest?
0
0
Entering edit mode
5 months ago

I have a local copy of the NR protein database to run blastp. I'm interested in analyzing a specific subset of prokaryotic genes. I've already extracted prokaryotic proteins from the NR database using the following steps:

1. Extracting proteins from bacteria and archaea taxids:

blastdbcmd -db nr -taxids 2,2157 -dbtype prot -out prokaryote_sequences.fasta

2. Creating a BLAST database from the extracted sequences:

makeblastdb -in prokaryote_sequences.fasta -dbtype prot -out nr_prok

Now, I need to further subset this database to include only specific genes of interest, such as rpoB. However, I suspect that a simple grep on the FASTA headers won't be sufficient because not all rpoB sequences might have "rpoB" in their headers.

My Question:

What is the best way to filter my custom BLAST database to include only the proteins of specific genes like rpoB?

BLASTp BLAST • 268 views
ADD COMMENT
0
Entering edit mode

You could blast with the gene of interest against this custom DB and then extract the ID's you need as fasta to create another subset database.

ADD REPLY

Login before adding your answer.

Traffic: 1061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6