NCBI MEGABLAST versus UCSC BLAT
2
0
Entering edit mode
7 months ago
a615ebfb ▴ 40

Hi All, I have a very large set 100mer nucleotide sequences (over 20 million) that I'd like to align versus a large number of reference bacterial genomes. I am looking for exact matches as well more distant matches up to 10 mismatches. Should I use megablast or blat?

Are there any documents or literature that someone could point me to that contains comparisons.
I need to stick with either megablast or blat for the moment.

There is so much stuff out there about alignments but I cannot find comparisons. Thanks!

blast blat alignment megablast • 951 views
ADD COMMENT
1
Entering edit mode

Consider the following for blat:

BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 20 bases.

DNA BLAT works by keeping an index of the entire genome in memory. If you have "large" number of genomes you may need to use only some at one time.

You may want to use magicblast with 100 mer nucleotides instead: https://ncbi.github.io/magicblast/

ADD REPLY
1
Entering edit mode
7 months ago

For 20M 100mers you might want to try read mappers such as bwa-mem2 or bowtie2

ADD COMMENT
1
Entering edit mode

I am looking for exact matches as well more distant matches up to 10 mismatches.

This requirement may make either of those unsuitable.

ADD REPLY
1
Entering edit mode

I guess it should be possible to select only the desired mappings by parsing the CIGAR strings.

Alternatively, https://github.com/smarco/gem3-mapper allows you to filter mappings by precise global alignment identity, which in this case should be 90-100%

ADD REPLY
1
Entering edit mode
7 months ago
noodle ▴ 590

Do you need to 'align' or 'assign'? If the later it might also be worthwhile checking out kraken2

ADD COMMENT

Login before adding your answer.

Traffic: 2213 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6