Tadpole assembler header
0
0
Entering edit mode
6.2 years ago
h.mon 35k

See the following examples of contig headers from a Tadpole assembly:

>contig_7,length=542,cov=6.2,min=4,max=8,gc=0.555,left=DEAD_END,right=DEAD_END
>contig_4,length=506,cov=1821.6,min=1396,max=2445,gc=0.571,left=B_BRANCH_2.7,right=B_BRANCH_1.1
>contig_1,length=4317,cov=42.3,min=21,max=89,gc=0.512,left=F_BRANCH_2.7,right=F_BRANCH_1.7

What is the meaning of the left= and right= fields of the header?

My understanding is this is a record of the reason for the interruption of the kmer graph for the left and right side of the contigs: DEAD_END (no more extensions possible on kmer graph) and _BRANCH_ (not possible to unambiguously extend the kmer graph), but what the F_ and B_ prefixes, and numbers suffixes, for the _BRANCH_? Are they indicating possible linkage paths?

bbmap bbtools tadpole • 1.5k views
ADD COMMENT
1
Entering edit mode

I am not sure, but perhaps you need to contact Brian Bushnell (the developer) directly. But I am curious if you are assembling a haploid (okay to use tadpole) or diploid and above ploidy genome (not recommended to tadpole as it doesn't yet [as far as I know] have the ability to deal with sequence heterozygosity).

ADD REPLY
1
Entering edit mode

As there are many BBTools users here, I thought maybe someone would know (or perhaps have an educated guess) about the header descriptions. I will try opening an issue at the BBTools SourceForge page. Brian Bushnell was quite active here and at SeqAnswers until a few months ago, but recently stoped posting. His tools have a very good and throughout documentation, so I like them very much, but these header descriptions are missing.

I am assembling a haploid cosmid. It has inverted repeats which threw away every assembler I tried so far.

I wouldn't say it is not okay to use tadpole for diploid genomes: one would get a highly fragmented and duplicated assembly, with very few or no errors. Most of the time, we want the most contiguous haploid representation of a genome, but there may be cases where it is fine, or maybe even needed, to have a highly redundant assembly.

ADD REPLY
0
Entering edit mode

Yes, I agree. That is why I said not recommended, but as you pointed out, there are indeed good uses for it for diploid or higher ploidy. It is super fast and resource efficient.

ADD REPLY

Login before adding your answer.

Traffic: 2838 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6