Hi all,
I want to assemble a region of a bacterium genome (~10kb). The sequenced dna is from a single species, cultured from a single clone.
By now, I've mapped the reads (Illuminma PE150) to reference, assembled using spades (SE mode) with reads (not all paired) retrieved by samtools, and the assembly graph generated using Bandage using with .gfa
file is below:
I've manually checked the "bubles", and almost of them are very similar (99%) except of 1-2bp mismatches. And the depths of the two paths of the "bubles' are almost half to half, so they may not be sequence error.
By the way, the sequenced dna is from a single species, cultured from a single clone.
So, how can I get consensus sequence from this GFA graph? It's kind of like diploid genome, but I'm not familiar with this.
Thank you in advance.
This works for
miniasm
GFA output : http://seqanswers.com/forums/showthread.php?t=64862Not sure if it will work for bandage. Since bacteria are haploid those differences may represent sequencing errors or population differences?
no no, the answer just reformat GFA to fasta.
According to the depths, the two "alleles" are almost half to half, so they may not be sequence error.
By the way, the sequenced dna is from a single species, cultured from a single clone.
Is bandage able to export all possible variations of the sequences (I assume that is what you are looking for)? You can't really get a consensus sequence if you have variations in each place where there is a bubble.
Sequenced DNA may be from one species but if the bacteria are under some sort of selection then perhaps you are observing various mutations being selected? What depth does
half to half
refer to?