Hi All,
I have a circular plasmid sample which was sequenced on MiSeq PE 300. I performed a de novo aseembly using SPAdes.
The contig which I obtained is 11526 bp long. The reference sequence for this plasmid is 12089bp long. When I BLAST the reference sequence against the de novo assembled contig I get the BLAST with Plus/Minus stand orientation as follows:
I did reverse complement of the reference and performed alignment again but it is aligning in the same orientation. I loaded the assembly in sequencher. I had to split the assembly into two fragments this introduced a long gap in the alignment as below from 10480 bp to 10999 bp. Sequencher output:
- How can I get a complete assembly of the genome from the PE 300 reads of a circular plasmid?
- Is there a way to renumber the bases in the de novo assembled sequence so that it aligns correctly without introducing a gap.
Hi Piet, Thank you for responding. I performed a resequencing analysis, in which I mapped the reads to the reference using BWA and performed variant calling apart from 1 insertion of CG at position 6354 and 2 insertions at positions 10384 of 22 bp long and 10349 of 4 bp long, there are no more variants in the consensus sequence. I also checked the depth at these positions and the average per base coverage is 8212X which I guess is high.
So does that mean that the assembler did not generate a complete assembly? If that is the case which assembler can I use? will generating assembly from merged reads work? Also, is it possible that the reason this region is not assembled is because it contains repeats? How can I check for repeats (Which tool or strategy to use to check repeats for this region?).
Additionally, you mentioned annotate the plasmid, how can I do that?
Please excuse me for long series of questions, but this is a challenging problem which I have not faced before and trying to troubleshoot.
Thanks!!!
I do not mean the average coverage. Please evaluate if there is any sharp drop in coverage along the sequence except for the ends. Map the reads to pTPK_AAV2 and then visually inspect the BAM file with Tablet and also run bedtools genomecov on the BAM file.
Is there any sharp drop in coverage, especially between positions 10400 and 11000? Is there any region with excessive coverage?
If you do not have experience with software for annotation yet, then I would recommend that you do it manually as an exercise. This short plasmid presumably has 10 to 15 genes, thus manual annotation is feasible.
First determine the open reading frames. Then cut out the sequence of each open reading frame and blast it against the NCBI 'nr' nucleotide database. Finally write the results into a GFF3 file. GFF3 is a simple text format, with one line per gene.