Question

How to get consensus FASTA sequence from GFA assembly graph?

2

Entering edit mode

6.0 years ago

shenwei356 8.7k

Hi all,

I want to assemble a region of a bacterium genome (~10kb). The sequenced dna is from a single species, cultured from a single clone.

By now, I've mapped the reads (Illuminma PE150) to reference, assembled using spades (SE mode) with reads (not all paired) retrieved by samtools, and the assembly graph generated using Bandage using with .gfa file is below:

I've manually checked the "bubles", and almost of them are very similar (99%) except of 1-2bp mismatches. And the depths of the two paths of the "bubles' are almost half to half, so they may not be sequence error.

By the way, the sequenced dna is from a single species, cultured from a single clone.

So, how can I get consensus sequence from this GFA graph? It's kind of like diploid genome, but I'm not familiar with this.

Thank you in advance.

gfa consensus • 5.3k views

ADD COMMENT • link updated 11 months ago by Adam Taranto ▴ 40 • written 6.0 years ago by shenwei356 8.7k

2

Entering edit mode

This works for miniasm GFA output : http://seqanswers.com/forums/showthread.php?t=64862
Not sure if it will work for bandage. Since bacteria are haploid those differences may represent sequencing errors or population differences?

ADD REPLY • link 6.0 years ago by GenoMax 148k

1

Entering edit mode

no no, the answer just reformat GFA to fasta.

According to the depths, the two "alleles" are almost half to half, so they may not be sequence error.

By the way, the sequenced dna is from a single species, cultured from a single clone.

ADD REPLY • link 6.0 years ago by shenwei356 8.7k

0

Entering edit mode

Is bandage able to export all possible variations of the sequences (I assume that is what you are looking for)? You can't really get a consensus sequence if you have variations in each place where there is a bubble.

Sequenced DNA may be from one species but if the bacteria are under some sort of selection then perhaps you are observing various mutations being selected? What depth does half to half refer to?

ADD REPLY • link 6.0 years ago by GenoMax 148k

score 1 · Answer 1 · 2018-12-17

I see, I have to manually generate the path, one way is:

Choosing one path and export the sequence
Possible two methods:
1. Manually marking and editing the sites with alleles as degenerate bases, with help of the paths not chosen and exported in step 1.
2. Mapping reads to sequence from step 1, and get consensus sequence from BAM files.

score 1 · Answer 2 · 2024-01-18

If you have long reads or Hi-C available you can use GraphUnzip to separate the haplotypes.

Alternatively, if the bubbles are only separated by a few SNVs (rather that being large insertions) you can just delete one of the alternate contigs and join the remaining linear segments in Bandage. If the bubbles were due to errors you might want to polish the assembly afterwards.