Question

conducting individual blastp for multiple protein sequences

0

Entering edit mode

3.3 years ago

rb77 • 0

Hello,

I have to blast multiple protein sequences from a given species (in a mulfasta file) against the human protein database, and the goal is to find the corresponding closest homolog for each protein sequence.

I'm wondering if there is a way to automate this process? of running individual blastp queries for each protein sequence against the whole human protein db and then grabbing top hits of each query? thank you and would appreciate any advice on this.

blastp multiple blast protein • 1.4k views

ADD COMMENT • link updated 3.3 years ago by Yannick Wurm ★ 2.5k • written 3.3 years ago by rb77 • 0

0

Entering edit mode

I would blast the whole multifasta file to the DB and grep afterwards, otherwise you will create a substantial amount of "overhead", loading the DB into memory each time etc ...

ADD REPLY • link 3.3 years ago by lieven.sterck 15k

0

Entering edit mode

when i try to blast the whole multifasta file to the DB it says

"Your total query length is greater than allowed on the BLAST webserver. You can either reduce the size to 100,000 or less and try again or run stand-alone <@STANDALONE_DOC@> or our <@STANDALONE_DOC_CLOUD@>."

also, I need the top hit for each protein sequence in the fasta file.. so im not sure if blasting the whole multifasta file will work..

ADD REPLY • link 3.3 years ago by rb77 • 0

0

Entering edit mode

Sounds like you are doing this at NCBI remotely. Perhaps split your multi-fasta file into pieces and try. If you have thousands of sequences then blast public resource is not meant to support that kind of application.

While not advisable you could select only 1 (ideally NCBI recommends 5 since the first hit is not guaranteed to be the best) "hit" per query.

ADD REPLY • link 3.3 years ago by GenoMax 150k

1

Entering edit mode

Furthermore, a simple BLAST is insufficient to establish homology. There are dedicated tools for this, some of which are based on blast.

I would recommend a literature search

ADD REPLY • link 3.3 years ago by Yannick Wurm ★ 2.5k