I need to BLAST a large number of query sequences in one go. How might I go about this ? I have a .rtf file containing a large number of query sequences ... in this format.
"MNKNEFTSIEVIPGYLGGKPFIKGTGVRVSEILDLLLAGIS
ILREYPGICNHDIDSAVSFLEAKLEMARQSQYTHEKVS"
"MNHIVYKNLKNYKYQLVKSYNFQTEIKTDLSLKIRKSEVKVFVN
LDPEGLLKIEAGYAWDGPSGPTIDTKTFIRGSLIHDALYQLMREEKLDRIKYRENADQ
LKKICLEDGMNSFRASYVYQFVRWFGESAARPKDESKEWEVAP"
where the sequences are separated by the "s. Any ideas on how I might go about performing BLAST searches on each of them against the same database in one go?
First I would save the RTF file as plain text, and then try to write a script to convert this into FASTA format. You will need to invent identifiers. Watch out for different quote characters (e.g. pretty left and right quotes) which may complicate this.
In future, create a plain text FASTA file directly when ever manually collecting sequences - and give them useful identifiers too.
I didn't manually collect them. I extracted these sequences from a .gbk file using a python script. I think I've got it appropriately formatted now. Could use some help with the local BLAST stuff though. Any guide/tutorial that you could point me to? The ncbi website has me really confused.
In that case, I would fix your Python script to get the protein sequences from GenBank files output directly in FASTA format. See e.g. http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank2fasta/