Compare two very close genomes (.fasta)
2
4
Entering edit mode
19 months ago
A_heath ▴ 170

Hi,

I have two fasta files corresponding to two close bacterial genomes (same species, and very close strains). I would like to find unique regions within the reference genome when compared to the other one. The main goal here is to be able to design PCR primers that will allow the specific identification of one of these strains. Note that both genomes are circularized and in .fasta formats.

I tried using Mugsy and Mauve for whole genome alignments but these tools aren't precise enough to differentiate two close genomes. So I tried Harvesttools (parsnp), or GSAlign to detect small variations but do you have any other recommendations, please?

For information, the ANI between these two genomes is 99.84%.

Thank you very much for your help!

Genome • 2.4k views
ADD COMMENT
1
Entering edit mode

Have you tried as standard BLAST search for local pairwise alignments?

ADD REPLY
1
Entering edit mode

Can these types of searches be done on whole genomes? Because I did not target any specific region that may contain nucleotide differences..

ADD REPLY
1
Entering edit mode

Of course! It should not take too long either as bacterial genomes are not that big and you're only searching one against the other which should be quick.

ADD REPLY
0
Entering edit mode

two close bacterial genomes

Important question is are there rearrangements that you know of? If there are then trying to align/compare two genomes using methods below may miss things.

ADD REPLY
4
Entering edit mode
19 months ago
Darked89 4.7k

This is probably not the optimal procedure, but it may work:

  1. use LAST for alignment: https://gitlab.com/mcfrith/last
  2. instead of doing whole genome to genome alignment use genome_A as database and chopped i.e. 100bp overlapping by 50bp fragments from genome_B
  3. you discard all fragments with a perfect match and look for the lowest score matches. Better yet, try to find fragments not aligned to genome_A if possible.
  4. reverse the procedure genome_B as database, genome_A chopped as fragments to find unique genome_A fragments
ADD COMMENT
1
Entering edit mode

Thank you very much for your answer. So I created a database with genome_A using lastdb and performed an alignment with genome_B using lastal. How can I chopped the genome by 100 bp overlapping 50 bp? Can this be done using LAST? and how can I retrieve fragments with the lowest score matches? Sorry for my ignorance with this tool..

ADD REPLY
0
Entering edit mode

One of the ways to get the overlapping fragments from a fasta is to use bedtools makewindows followed by bedtools getfasta:

bedtools makewindows -g genomeB_sizes_file -w 100 -s 50 > genomeB_windows100_50.bed
bedtools getfasta -fi genomeB.fas -bed genomeB_windows100_50.bed > genomeB_windows100_50.fas

you will use then the genomeB_windows100_50.fas to map to genomeA LAST database.

ADD REPLY
1
Entering edit mode

Since a fasta query has been created genomeB_windows100_50.fas, it can be used with genomeA with minimap2 and a standard BAM file can be generated.

ADD REPLY
0
Entering edit mode

Good idea. But I would probably go with the PAF output for parsing.

ADD REPLY
0
Entering edit mode

LAST outputs the results in a MAF format. You may search for lines with scores: grep '^a score your_result.maf | sed 's/=/\t/g' > scores.txt` and sort it to get the idea about the lowest scores.

Or use maf-convert` to get a tab or psl file. https://gitlab.com/mcfrith/last/-/blob/main/doc/maf-convert.rst

To get the non-mapped fragments you need to compare two lists:

  1. fragment names from genomeB_windows100_50.fas
  2. fragment names from the LAST output

    sort fasta_names mapped_names | uniq -c | sort -k1n > ordered_counts_fragments.txt

Fragments occuring only once (1 fragment_name) is what you look for

ADD REPLY
4
Entering edit mode
19 months ago

Try unikmer, which can be used to find unique regions between multiple genomes.

How to get the sequence differences between multiple bacterial genomes

ADD COMMENT
1
Entering edit mode

Thank you very much for your suggestion! We tried unikmer and it works very great for our need.

ADD REPLY

Login before adding your answer.

Traffic: 1049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6