How to get consensus FASTA sequence from GFA assembly graph?
2
2
Entering edit mode
5.9 years ago

Hi all,

I want to assemble a region of a bacterium genome (~10kb). The sequenced dna is from a single species, cultured from a single clone.

By now, I've mapped the reads (Illuminma PE150) to reference, assembled using spades (SE mode) with reads (not all paired) retrieved by samtools, and the assembly graph generated using Bandage using with .gfa file is below:

I've manually checked the "bubles", and almost of them are very similar (99%) except of 1-2bp mismatches. And the depths of the two paths of the "bubles' are almost half to half, so they may not be sequence error.

By the way, the sequenced dna is from a single species, cultured from a single clone.

So, how can I get consensus sequence from this GFA graph? It's kind of like diploid genome, but I'm not familiar with this.

Thank you in advance.

gfa consensus • 5.2k views
ADD COMMENT
2
Entering edit mode

This works for miniasm GFA output : http://seqanswers.com/forums/showthread.php?t=64862
Not sure if it will work for bandage. Since bacteria are haploid those differences may represent sequencing errors or population differences?

ADD REPLY
1
Entering edit mode

no no, the answer just reformat GFA to fasta.

According to the depths, the two "alleles" are almost half to half, so they may not be sequence error.

By the way, the sequenced dna is from a single species, cultured from a single clone.

ADD REPLY
0
Entering edit mode

Is bandage able to export all possible variations of the sequences (I assume that is what you are looking for)? You can't really get a consensus sequence if you have variations in each place where there is a bubble.

Sequenced DNA may be from one species but if the bacteria are under some sort of selection then perhaps you are observing various mutations being selected? What depth does half to half refer to?

ADD REPLY
1
Entering edit mode
5.9 years ago

I see, I have to manually generate the path, one way is:

  1. Choosing one path and export the sequence
  2. Possible two methods:
    1. Manually marking and editing the sites with alleles as degenerate bases, with help of the paths not chosen and exported in step 1.
    2. Mapping reads to sequence from step 1, and get consensus sequence from BAM files.
ADD COMMENT
1
Entering edit mode
10 months ago
Adam Taranto ▴ 40

If you have long reads or Hi-C available you can use GraphUnzip to separate the haplotypes.

Alternatively, if the bubbles are only separated by a few SNVs (rather that being large insertions) you can just delete one of the alternate contigs and join the remaining linear segments in Bandage. If the bubbles were due to errors you might want to polish the assembly afterwards.

ADD COMMENT

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6