Question

How to start with blast and a local database

0

Entering edit mode

6.4 years ago

gbl1 ▴ 80

Hi,

I have a "complete" genome of a plant in a fasta file (16Go) I need to find some part of it, so I use blast. My first issue, when creating a database, it only accept 1Go… meaning most of my reads are not considered. How to proceed to get all my reads in a given database. Then, If I blast a sequence from another species in my db, how to get the output in a fasta file? By the way, is it possible to use a local blast through a browser, looking like NCBI, instead of a terminal? Eventually, is it possible to obtain a consensus sequence from those sequences? Or doing a local assembly.

Thanks in advance

Assembly blast genome • 6.8k views

ADD COMMENT • link updated 6.4 years ago by colindaven 7.0k • written 6.4 years ago by gbl1 ▴ 80

0

Entering edit mode

Can you clarify what you mean with 'Go'?

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

1

Entering edit mode

giga octet?

Is that obsolete word?

ADD REPLY • link 6.4 years ago by gbl1 ▴ 80

3

Entering edit mode

I don't think so ... though somewhat french-centric ;)

just not very common (if existing at all) in english language , where it's apparently even translated as 'gigabyte'

ADD REPLY • link 6.4 years ago by lieven.sterck 15k

0

Entering edit mode

I'm guessing he means 'Gb'

ADD REPLY • link 6.4 years ago by lieven.sterck 15k

0

Entering edit mode

Just answering this part:

" is it possible to use a local blast through a browser"

Yes you can.

here is the tool: https://openwetware.org/wiki/Wwwblast

bonus: NCBI workbench but it have more than blast. https://www.ncbi.nlm.nih.gov/tools/gbench/

ADD REPLY • link 6.4 years ago by Medhat 9.8k

0

Entering edit mode

I have a "complete" genome of a plant in a fasta file (16Go)

Is this "complete" plant genome an assembled genome? Or raw sequencing reads?

ADD REPLY • link 6.4 years ago by h.mon 35k

score 5 · Answer 1 · 2018-07-05

5

Entering edit mode

6.4 years ago

colindaven 7.0k

To answer one part of your question, there is a graphical blast web client which can be easily installed locally using docker. It is still a bit buggy though. It may also be able to index your sequence properly, not sure what your issue is there really.

https://github.com/wurmlab/sequenceserver

ADD COMMENT • link 6.4 years ago by colindaven 7.0k

0

Entering edit mode

installation failed...

ADD REPLY • link 6.4 years ago by gbl1 ▴ 80

1

Entering edit mode

That was not really informative enough to help any further. Just a recommendation anyway.

ADD REPLY • link 6.4 years ago by colindaven 7.0k

0

Entering edit mode

Thanks @colindaven for recommending sequenceserver as a BLAST interface. Hi @gbl1 - it looks like you struggled with one of the other recommendations as well. If you can provide some details as to how the installation failed, perhaps we can help you get sequenceserver running... Note that the github repository is more for developers and the normal install mechanisms - using ruby or docker are at http://sequenceserver.com Please let us know!

ADD REPLY • link 4.5 years ago by Yannick Wurm ★ 2.5k

score 1 · Answer 2 · 2018-07-03

1

Entering edit mode

6.4 years ago

gb ★ 2.2k

Lots of questions, maybe you should break it down. There is also an other post that has the same question about the webblast. A: local web BLAST server

If the reference genome is to big (your pc has not enough memory) you could split your reference plant genome and make more smaller databases. To BLAST against everything you can make a .nal file https://www.ncbi.nlm.nih.gov/books/NBK279693/. You can also add all the database in de command like.

blastn -query query.fa -db "part1 part2 part3" -out output

Here you can see how you can get the output in a fasta file: Blast+ results file parsing to fasta file you need to use a custom output and format it in a fasta format afterwards.

Not sure about the consensus part, maybe you should use something like bowtie or BWA to map and use samtools to make a consensus. With blast you only have two sequences the target and the query. For a consensus you need at least 3?

ADD COMMENT • link 6.4 years ago by gb ★ 2.2k

1

Entering edit mode

As complementary to the answer, here is a small article regarding the size of rams and building blast db:

http://etutorials.org/Misc/blast/Part+IV+Industrial-Strength+BLAST/Chapter+11.+BLAST+Databases/11.2+BLAST+Databases/

ADD REPLY • link 6.4 years ago by Medhat 9.8k

0

Entering edit mode

That is a nice page. I am not a big fan of how ncbi explains BLAST

ADD REPLY • link 6.4 years ago by gb ★ 2.2k

score 1 · Answer 3 · 2018-07-05

1

Entering edit mode

6.4 years ago

gbl1 ▴ 80

Hi,

So, let me tell you how it went:

-First, I found a small program called "fasta-splitter.pl" that helped me having smaller and suitable fasta files.

-I could make 6 databases, meaning: progress, but the 10 more are still missing…

-making an alias DB failed…

-for getting close enough to fasta: blastn -query file.txt -outfmt '6 qseqid sseqid sseq evalue' -db "db1 db2 db3 …" has worked

I about dead by now…

ADD COMMENT • link 6.4 years ago by gbl1 ▴ 80

0

Entering edit mode

Good to hear that you made progress, also you are getting more familiar with blast, tools and the command line. It is always hard to start with new things like this. And yes it is exhausting sometimes. Keep up the work, you can do it =)

ADD REPLY • link 6.4 years ago by gb ★ 2.2k

0

Entering edit mode

I am familiar with commande lines… but… I need to get what are the arguments to use… thats another issue!

blastn -query file.txt -outfmt '6 qseqid sseqid sseq evalue' -db "db1 db2 db3 …" is not what I look for: it only returns the aligned sequence… My aim is to know what just outside! No point in computing what I already know!

Is there any arguments of blast that would return all the full sequences that match?

ADD REPLY • link 6.4 years ago by gbl1 ▴ 80

1

Entering edit mode

I do not see that option in the outfmt possibilities, you can write a small script for that. But this page did not helped you? Blast+ results file parsing to fasta file

ADD REPLY • link 6.4 years ago by gb ★ 2.2k

0

Entering edit mode

I read it several time… It does not do what I want for it only gives a fasta with the part of the sequence that are matching… technically, it gives me only what I allready have!

ADD REPLY • link 6.4 years ago by gbl1 ▴ 80

0

Entering edit mode

oh I see it now. You could do something like this in python:

from Bio import SeqIO
hits = set()
with open("blastoutput", "r") as blastoutput:
    for hit in blastoutput:
        hits.add(hit.split("\t")[0])

with open("fastafile", "r") as fasta, open("output.fa", "a") as output:
    for record in SeqIO.parse(fasta, "fasta"):
        if strrecord.id) in hits:
            output.write(">"+str(record.description)+"\n")
            output.write(">"+str(record.seq)+"\n")

Think it is not the cleanest way, but just give an example on top of my head

ADD REPLY • link 6.4 years ago by gb ★ 2.2k

0

Entering edit mode

looks cool... how to use that script? where do I tell blast the db to use? and to tell my input file?

ADD REPLY • link 6.4 years ago by gbl1 ▴ 80

0

Entering edit mode

Blast will never return anything else than aligned sequences (actually alligned HSPs not even sequences). If you want to get the full sequences you will need to parse the blast result, get the hit IDs and extract those from the blastDB(s) you used.

blastdbcmd -db" db1 db2 db3" -entry <hit ID>

After which you can then do a global aligned of them using a different software

ADD REPLY • link 6.4 years ago by lieven.sterck 15k

0

Entering edit mode

In addition to this:

#!/bin/sh

awk -F "\t"{print $1}' blastoutput.txt  | 
while read line; do
    blastdbcmd  -db" db1 db2 db3" -entry $line
done

ADD REPLY • link 6.4 years ago by gb ★ 2.2k

1

Entering edit mode

or simply use the -entry_batch option, which takes a file with hit IDs as input

ADD REPLY • link 6.4 years ago by lieven.sterck 15k