Question

Blastp stopped

0

Entering edit mode

2.9 years ago

carlosgonzalezcruz327 ▴ 20

hi everyone i'm new in this, and i've a problem. i was running a blastp with the next commad line:

./blastp -query longest.pep -db uniprop.pep -num_threads 4  -max_target_seqs 1 - evalue 1e-5  -outfmt 6 -out blastpc.latifolia.cvs

it running about 10 hours and stopped, my query have about 280 thousand amin sequences (longest ORFs) from a De novo trancriptome and my db have more half million proteins, and the outfile have only about 55 thousand matches

why don´t do all matches and why stopped at 50 thuosand?

i'm using a Lenovo desktop.

thanks

blastp • 1.9k views

ADD COMMENT • link updated 2.9 years ago by Michael 55k • written 2.9 years ago by carlosgonzalezcruz327 ▴ 20

1

Entering edit mode

Is this on a personal machine (i.e. you are the only user)? Did you run out of disk space to where the output was being written to? Since 55K sequences are in the output this was obviously working (so you must have enough RAM to run this search).

ADD REPLY • link 2.9 years ago by GenoMax 147k

0

Entering edit mode

Hi, thanks for answering.

Yes, I'm the only user. My machine has 16 GB of ram and a processor intel® Core™ i7-10700 CPU @ 2.90GHz × 16

ADD REPLY • link 2.9 years ago by carlosgonzalezcruz327 ▴ 20

1

Entering edit mode

run df -h . in the same directory you running blast from. I am a bit concerned because of ./blastp so you running the query in the same directory where blast is installed. that is maybe not a good idea, although I doubt it is the reason for blast stopping.

ADD REPLY • link 2.9 years ago by Michael 55k

1

Entering edit mode

Also, what does blast stopped mean? Has it not produced more output for a while? Possibly, it is just processing output in chunks and will continue after a while. Blasting 280k sequences might take much longer than 10 hours, so it might be worth to just wait.

ADD REPLY • link 2.9 years ago by Michael 55k

0

Entering edit mode

everything stopped, only gave one file.

ADD REPLY • link 2.9 years ago by carlosgonzalezcruz327 ▴ 20

2

Entering edit mode

Way you are running the search you are only going to get one file. You will need to run -outfmt 7 to include queries that did not produce any hits. So 55K entries you are observing are likely those that actually produced a hit. Your search may have actually completed in 10 h.

ADD REPLY • link 2.9 years ago by GenoMax 147k

1

Entering edit mode

I agree, looks like everything went just "fine" and the process finished. Use -outfmt 7 or 0 to see all queries. 55k out of 280k seems rather low. I would try blastx instead, it will take longer but the result should be more robust to frameshifts and fragmented transcripts than a longest-ORF approach. If you want to reduce runtime, you could reduce the assembly to just the longest isoform per gene (unigene). In fact, I would prefer to use longest isoform -> blastx over all -> translation -> blastp any time.

ADD REPLY • link 2.9 years ago by Michael 55k

0

Entering edit mode

thanks for all answers.

i'm going to try with your recomendations

ADD REPLY • link 2.9 years ago by carlosgonzalezcruz327 ▴ 20

0

Entering edit mode

Please confirm if above was true i.e. job did finish properly.

ADD REPLY • link 2.9 years ago by GenoMax 147k

0

Entering edit mode

I did a blastx with the transcriptome versus uniprot, I got 97k hit

ADD REPLY • link 2.9 years ago by carlosgonzalezcruz327 ▴ 20

0

Entering edit mode

That is an improvement indeed. Do you think you could blast against all NR too?

ADD REPLY • link 2.9 years ago by Michael 55k