How to perform BLAST search containing a large number of query sequences ?
3
0
Entering edit mode
10.2 years ago

I need to BLAST a large number of query sequences in one go. How might I go about this ? I have a .rtf file containing a large number of query sequences ... in this format.

"MNKNEFTSIEVIPGYLGGKPFIKGTGVRVSEILDLLLAGIS
ILREYPGICNHDIDSAVSFLEAKLEMARQSQYTHEKVS"
"MNHIVYKNLKNYKYQLVKSYNFQTEIKTDLSLKIRKSEVKVFVN
LDPEGLLKIEAGYAWDGPSGPTIDTKTFIRGSLIHDALYQLMREEKLDRIKYRENADQ
LKKICLEDGMNSFRASYVYQFVRWFGESAARPKDESKEWEVAP"

where the sequences are separated by the "s. Any ideas on how I might go about performing BLAST searches on each of them against the same database in one go?

sequence blast • 6.2k views
ADD COMMENT
4
Entering edit mode
10.2 years ago
Michael 55k

Convert your input into FASTA format, then run local blast.

ADD COMMENT
0
Entering edit mode

First I would save the RTF file as plain text, and then try to write a script to convert this into FASTA format. You will need to invent identifiers. Watch out for different quote characters (e.g. pretty left and right quotes) which may complicate this.

In future, create a plain text FASTA file directly when ever manually collecting sequences - and give them useful identifiers too.

ADD REPLY
0
Entering edit mode

I didn't manually collect them. I extracted these sequences from a .gbk file using a python script. I think I've got it appropriately formatted now. Could use some help with the local BLAST stuff though. Any guide/tutorial that you could point me to? The ncbi website has me really confused.

ADD REPLY
0
Entering edit mode

In that case, I would fix your Python script to get the protein sequences from GenBank files output directly in FASTA format. See e.g. http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank2fasta/

ADD REPLY
1
Entering edit mode
10.2 years ago

As said above you need to have your input in FASTA format and set up local blast. Some important tips:

  • Use the latest BLAST+ executables and not the legacy ones. Those are a lot more up-to-date, have fewer bugs and work better generally.
  • Be sure you put your queries in one single FASTA file rather than splitting them. This is called query concatenation and it speeds up the searches a lot. See the "Concatenation of queries" section here.
  • If you want xml output and use query concatenation, beware that there are some inconsistencies in the xml output, which are being discussed right now. Future versions of BLAST+ will probably correct that
ADD COMMENT
0
Entering edit mode
10.2 years ago
vaskin90 ▴ 290

There are BLAST elements in UGENE Workflow Designer. You can create a scheme with either local or remote BLAST elements and feed it with all the sequences that you want to process.

ADD COMMENT

Login before adding your answer.

Traffic: 1984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6