bacterial genome assembly output from canu
2
0
Entering edit mode
3.9 years ago
rthapa ▴ 90

Hi,

I have done de novo genome assembly of a bacterial strain using canu. I want to find structural variants comparing with the reference genome. Since, the bacterial genome is circular, It is hard to find the origin of replication to align with the reference genome. Does anyone have suggestions how to find structural variants in the bacterial genome comparing with the reference genome?

Thanks

bacteria genome assembly canu • 1.6k views
ADD COMMENT
0
Entering edit mode

This question has been asked a couple of different ways: de novo genome assembly of bacterial genome

rthapa : Have you tried to repeat the assembly? Perhaps you will get an assembly that will be co-linear with the reference.

ADD REPLY
0
Entering edit mode

Yes, I tried to repeat the assembly after removing read lengths shorter than 2000 bp. The assembly result is similar. I think the issue is due to incorrect identification of origin of replication during genome assembly.

ADD REPLY
1
Entering edit mode

Where is the dnaA gene located in your assembly and where is it in the reference? Paper1 and Paper2.

ADD REPLY
1
Entering edit mode
3.9 years ago
shelkmike ★ 1.4k

There are many ways to do this. I suggest to do the following:
1) Align your genome to the reference genome using pairwise megablast
2) From the alignment you'll be able to find the position of your genome that corresponds to the first position in the reference. Then change your genome sequence, so now it starts from this position.
3) Align your genome to the reference genome again and look at the dotplot. You'll be able to see structural differences on the dotplot.

ADD COMMENT
0
Entering edit mode
3.9 years ago
juanjo75es ▴ 130

I think Quast is a good tool for that. It already aligns the assembly to the reference independently of any issue with circularity. I think it's also useful to get two different assemblies with two different software. Sometimes it's just the assembler that fails.

Here you have a likely real rearrangement verified by three different algorithms (SPAdes, rnaSPAdes and Contignant s-aligner):

Alignment of reads obtained with SPAdes, rnaSPAdes and Contignant s-aligner

Here you have a false rearrangement detected with SPAdes

Alignment of reads obtained with SPAdes

ADD COMMENT

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6