Hello, I got data from a lab , that is a plasmid that has been recently sequenced with nanopore. The plasmid is roughly 8kb and contains 40 copies of the same insert. I have the sequences of the plasmid and the insert.
I would like to check that the plasmid does really contain the number of inserts it is supposed to (roughly 40 copies). What would be the best approach you would recommend to assembly the NGS data of this plasmid and count the repeats ?
Thanks
Great Hugo !!! That´s exactly what i was following. Assembly reads with aligner (wtdbg2) but i will give flye a try. Then aligning reads with mummer to the reference.
Really helpfull!!!! thanks
Your welcome ! :)
I do not recommend that you map reads with mummer it is better suited for large assembled sequences like contigs and chromosomes, i never tried to use it for long reads. If you want to map reads, you don't need to do assembly just map the reads with minimap2 on the the builted reference. But i think assembling first to work with a continuous consensus sequence will correct some noise of the nanopore reads and give a better result.
Sorry i meant aligning the repeat (25nt) to the assembled genome and visualize the plot , or extract coord with mummer to get the number of mapped repeats.
Agree for aligning reads i would use minimap2.
By the way the first run of flye did not work.” 0 disjointings assembled” although coverage, number of reads and N50-90 are good.
hummm it seems like although flye is better to find plasmids in a full genome assemblie it did not do well when the datasets is made only of plasmid reads. Let's try to do some variant call, i think this will do. Did you make a reference like i said before (plas+repeats+mids) ? If you did:
Map the reads:
some file conversion:
from here you will need sniffles:
Now you can visualize the .vcf with Artemis, IGV, or parse it with biopython to do a more efficient job.