Entering edit mode
2.9 years ago
carlosgonzalezcruz327
▴
20
hi everyone i'm new in this, and i've a problem. i was running a blastp with the next commad line:
./blastp -query longest.pep -db uniprop.pep -num_threads 4 -max_target_seqs 1 - evalue 1e-5 -outfmt 6 -out blastpc.latifolia.cvs
it running about 10 hours and stopped, my query have about 280 thousand amin sequences (longest ORFs) from a De novo trancriptome and my db have more half million proteins, and the outfile have only about 55 thousand matches
why don´t do all matches and why stopped at 50 thuosand?
i'm using a Lenovo desktop.
thanks
Is this on a personal machine (i.e. you are the only user)? Did you run out of disk space to where the output was being written to? Since 55K sequences are in the output this was obviously working (so you must have enough RAM to run this search).
Hi, thanks for answering.
Yes, I'm the only user. My machine has 16 GB of ram and a processor intel® Core™ i7-10700 CPU @ 2.90GHz × 16
run
df -h .
in the same directory you running blast from. I am a bit concerned because of./blastp
so you running the query in the same directory where blast is installed. that is maybe not a good idea, although I doubt it is the reason for blast stopping.Also, what does blast stopped mean? Has it not produced more output for a while? Possibly, it is just processing output in chunks and will continue after a while. Blasting 280k sequences might take much longer than 10 hours, so it might be worth to just wait.
everything stopped, only gave one file.
Way you are running the search you are only going to get one file. You will need to run
-outfmt 7
to include queries that did not produce any hits. So 55K entries you are observing are likely those that actually produced a hit. Your search may have actually completed in 10 h.I agree, looks like everything went just "fine" and the process finished. Use -outfmt 7 or 0 to see all queries. 55k out of 280k seems rather low. I would try blastx instead, it will take longer but the result should be more robust to frameshifts and fragmented transcripts than a longest-ORF approach. If you want to reduce runtime, you could reduce the assembly to just the longest isoform per gene (unigene). In fact, I would prefer to use longest isoform -> blastx over all -> translation -> blastp any time.
thanks for all answers.
i'm going to try with your recomendations
Please confirm if above was true i.e. job did finish properly.
I did a blastx with the transcriptome versus uniprot, I got 97k hit
That is an improvement indeed. Do you think you could blast against all NR too?