How to perform BLASTn on a large 348kb phage genome?
1
0
Entering edit mode
12 weeks ago

I'm working with a phage genome that is 348kb in size. When I try to run a standard BLASTn search, I'm unable to complete the analysis due to the large size of the sequence.

  1. What are the best approaches or tools for performing BLAST searches on such large genomes?
  2. Are there any specialized BLAST servers or services designed for handling sequences of this size?
  3. Would it be advisable to split the genome into smaller segments for analysis? If so, what would be the recommended way to do this?
  4. Are there any alternative alignment tools that might be more suitable for this task?
  5. What computational resources might be necessary to run a local BLAST on a sequence of this size?

Any advice or suggestions would be greatly appreciated. Thank you!

BLASTn • 766 views
ADD COMMENT
2
Entering edit mode
12 weeks ago
Mensur Dlakic ★ 28k

When I try to run a standard BLASTn search, I'm unable to complete the analysis due to the large size of the sequence.

By standards of genome science, that is a tiny genome. I suspect your problem is not what you think it is. Any modern computer should easily handle genomes that are hundreds of millions in size, let alone a large-ish (but ultimately very small) viral genome.

We may be able to help you better if you describe what is it you did, what was the error message and what type of computer you have.

ADD COMMENT
0
Entering edit mode

my computer specification: 12th Gen Intel(R) Core(TM) i7-1260P 2.10 GHz, 64-bit operating system, x64-based processor.

I simply performed BLASTn on NCBI and after 10 minutes of running it shows the error "There was a problem with the search. Please, contact Help Desk and include RID CRT1WU1Z016.

Informational Message: [blastsrv4.REAL]: Error: Process size limit exceeded, resulting in SIGXFSZ (25)."

ADD REPLY
1
Entering edit mode

Now you are being lazy: you performed BLASTn on what database and with what query? You won't get far if we have to pull this out from you word by word, so I suggest you take a moment and explain this properly. One thing is for sure, the problem has nothing with your computer, even though you didn't bother to tell us its RAM size.

Most likely it is simple: run the search locally. That means downloading a local copy of BLASTn and whatever database interests you, and run this search using your own resources. Web BLAST is not meant for queries that are 384 K in size. This is done to protect other users from jobs that take up big resources. If, for example, you tried to search this viral genome against the whole nucleotide database, the system will not complete this job. That would easily take half a day or longer and NCBI doesn't have that kind of resources to spare on any individual user.

ADD REPLY
0
Entering edit mode

I apologize for any confusion in my previous message. To clarify, my computer has 16GB of RAM. I performed web BLASTn on virus database (taxaID Viruses:10239. My query consist of nucleotide sequences (348kb). When i upload the query sequences after 10 minutes of running shows the error message, which i have mentioned in my previous response.

To clarify my situation, i am familiar with downloading BLASTn. However, i am encountering difficulty in identifying the specific database for virus (taxaID viruses:10239), list provided in the link https://ftp.ncbi.nlm.nih.gov/blast/db/ Thank you for your guidance. I appreciate your patience with my novice status in bioinformatics.

ADD REPLY
1
Entering edit mode

You could get annotated viral genomes from https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=10239&annotated_only=true&refseq_annotation=true&genbank_annotation=true

Click on the check box at top corner of table (check box next to word Assembly) to select all genomes. Then choose "Download --> Download package". On the next dialog choose "RefSeq --> Fasta sequence". This will allow you to download ~153K genomes as a zip archive. It may take a while so be patient.

ADD REPLY
1
Entering edit mode

I simply performed BLASTn on NCBI and after 10 minutes of running it shows the error "There was a problem with the search. Please, contact Help Desk and include RID CRT1WU1Z016.

This may be a transient error. Have you tried to do the search again? Just to be certain you are using 348kb phage as your query for web blast?

ADD REPLY
0
Entering edit mode

yes, query sequence length is 348kb and i have tried many times but showing the same error.

ADD REPLY
1
Entering edit mode

While 384kb is below the max limit of 1M allowed for web blast, you could try to split the query in three-four overlapping pieces and see if that helps.

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6