Question

Sensitive (BLAST like) and fast alignment of millions of sequences against human sized reference genomes

0

Entering edit mode

6.8 years ago

William ★ 5.3k

Hi,

What technology (both hardware & software) is out there that allows sensitive (BLAST like) and fast alignment of millions of sequences (c.a. 200bp) against multiple reference genomes?

I know there is:

Normal software blast: Very sensitive but slow for many millions of sequences against multiple human sized references
Hardware accelerated blast: Sensitive and fast but requires specialized and expensive hardware
BWA/SNAP/minimap2: Fast, no specialized hardware, but not sensitive enough for all use cases. Achieves speed (partially) trough exact matching on seeds instead of full alignments. Often misses alignments that are discovered via blast.

The use cases are:

Determining which (c.a. 200bp) sequences are unique or multi-copy on a certain reference
Lifting over (c.a. 200bp) sequences from one reference genome sequence to another reference genome sequence ( for non model organisms, where no chain files are available that all the lift over tools seem to use).

I am looking to find something that:

has the quality / sensitivity of normal blast
fast enough to align millions of sequences
cost effective
reliable and user friendly

blast bwa alignment lift over multicopy • 3.6k views

ADD COMMENT • link updated 6.8 years ago by lieven.sterck 15k • written 6.8 years ago by William ★ 5.3k

0

Entering edit mode

Doing a hands-on trial with different NGS aligners (with your own data, which we can't see) is the only logical thing. I would recommend looking at BBMap. There are plenty of options you can play with to see if you can optimize the alignments.

Even though you have posed this as a question, it has the feel that you know the answer. You are just looking to get second opinions on a strategy you have in mind to see if they match :-)

ADD REPLY • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

Why is BBMAP more close to the sensitivity of BLAST than the other short read aligners like BWA/SNAP/minimap2? And is it still fast (since it is written in Java?)?

ADD REPLY • link 6.8 years ago by William ★ 5.3k

0

Entering edit mode

Don't go on fact that BBMap is written in Java. It is very efficiently written by @Brian Bushnell. It can hold up to any aligner out there now. It is multi-threaded and fully supports pigz.

Why is BBMAP more close to the sensitivity of BLAST

I have not looked at that specifically. What I do know if that there are many alignment options you can tweak. It is as simple as ambig=all to get all possible multiple alignments across the genome.

ADD REPLY • link 6.8 years ago by GenoMax 147k

score 2 · Answer 1 · 2018-02-08

2

Entering edit mode

6.8 years ago

Matteo Schiavinato ★ 3.6k

Given that your sequence length is just a little bit longer than the average illumina read, I would still go for hisat2. This way you can maximize the information retention with the BAM formatted output, from which it is easy to extract which sequences have secondary alignments.

ADD COMMENT • link 6.8 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Why is hisat2 more close to the sensitivity of BLAST than the other short read aligners like BWA/SNAP/minimap2? Is it because of the multiple index types?

ADD REPLY • link 6.8 years ago by William ★ 5.3k

0

Entering edit mode

I should have said "I would still go for a reads alignment program", Hisat is the one that came to my mind. You can put BWA or STAR there, same message!

ADD REPLY • link 6.8 years ago by Matteo Schiavinato ★ 3.6k

score 1 · Answer 2 · 2018-02-08

1

Entering edit mode

6.8 years ago

lieven.sterck 15k

Have you considered DIAMOND ? Blast-like, super fast and quite accessible.

Alternatively you can of course also simply parallelize a 'normal' blast search on a computer cluster (sounds like you have an appropriate setup to split this blast up in many many subparts)

ADD COMMENT • link 6.8 years ago by lieven.sterck 15k

0

Entering edit mode

DIAMOND only does searches against protein databases. Likely not what OP wants.

ADD REPLY • link 6.8 years ago by GenoMax 147k

1

Entering edit mode

PLAST does all the BLAST like programs (blastn, blastp,blastx etc) and in my testing is Faster and more sensitive than Diamond.

ADD REPLY • link 6.8 years ago by Jake Warner ▴ 840

0

Entering edit mode

Correct, I am aligning DNA fragments of ca 200 bp against indexed genomes (DNA) of human size and smaller.

ADD REPLY • link 6.8 years ago by William ★ 5.3k

0

Entering edit mode

indeed, my bad.

and I assume you already considered megablast then as well?

ADD REPLY • link 6.8 years ago by lieven.sterck 15k