Question

de novo assembly of circular plasmid

1

Entering edit mode

6.2 years ago

kspata ▴ 90

Hi All,

I have a circular plasmid sample which was sequenced on MiSeq PE 300. I performed a de novo aseembly using SPAdes. The contig which I obtained is 11526 bp long. The reference sequence for this plasmid is 12089bp long. When I BLAST the reference sequence against the de novo assembled contig I get the BLAST with Plus/Minus stand orientation as follows:

enter image description here

I did reverse complement of the reference and performed alignment again but it is aligning in the same orientation. I loaded the assembly in sequencher. I had to split the assembly into two fragments this introduced a long gap in the alignment as below from 10480 bp to 10999 bp. Sequencher output: enter image description here

How can I get a complete assembly of the genome from the PE 300 reads of a circular plasmid?
Is there a way to renumber the bases in the de novo assembled sequence so that it aligns correctly without introducing a gap.

assembly de novo spades plasmids • 2.5k views

ADD COMMENT • link updated 6.2 years ago by piet ★ 1.9k • written 6.2 years ago by kspata ▴ 90

score 0 · Answer 1 · 2018-09-28

0

Entering edit mode

6.2 years ago

piet ★ 1.9k

The gap indicates that your putative plasmidic contig differs from pTPK_AAV2 by a deletion of about 500 nt. Such deletions or other recombinations are commen in plasmids. You should annotate pTPK_AAV2 and check if lost of the gap region is plausible. To verify the gap, you may map your reads to pTKP_AAV2. You should see a sharp drop in coverage between positions 10450 and 11000.

ADD COMMENT • link 6.2 years ago by piet ★ 1.9k

0

Entering edit mode

Hi Piet, Thank you for responding. I performed a resequencing analysis, in which I mapped the reads to the reference using BWA and performed variant calling apart from 1 insertion of CG at position 6354 and 2 insertions at positions 10384 of 22 bp long and 10349 of 4 bp long, there are no more variants in the consensus sequence. I also checked the depth at these positions and the average per base coverage is 8212X which I guess is high.

So does that mean that the assembler did not generate a complete assembly? If that is the case which assembler can I use? will generating assembly from merged reads work? Also, is it possible that the reason this region is not assembled is because it contains repeats? How can I check for repeats (Which tool or strategy to use to check repeats for this region?).

Additionally, you mentioned annotate the plasmid, how can I do that?

Please excuse me for long series of questions, but this is a challenging problem which I have not faced before and trying to troubleshoot.

Thanks!!!

ADD REPLY • link 6.2 years ago by kspata ▴ 90

0

Entering edit mode

the average per base coverage

I do not mean the average coverage. Please evaluate if there is any sharp drop in coverage along the sequence except for the ends. Map the reads to pTPK_AAV2 and then visually inspect the BAM file with Tablet and also run bedtools genomecov on the BAM file.

bedtools genomecov -bga -ibam myreads_on_pTPK_AAV2.bam

Is there any sharp drop in coverage, especially between positions 10400 and 11000? Is there any region with excessive coverage?

ADD REPLY • link 6.2 years ago by piet ★ 1.9k

0

Entering edit mode

annotate the plasmid, how can I do that?

If you do not have experience with software for annotation yet, then I would recommend that you do it manually as an exercise. This short plasmid presumably has 10 to 15 genes, thus manual annotation is feasible.

First determine the open reading frames. Then cut out the sequence of each open reading frame and blast it against the NCBI 'nr' nucleotide database. Finally write the results into a GFF3 file. GFF3 is a simple text format, with one line per gene.

ADD REPLY • link 6.2 years ago by piet ★ 1.9k