Entering edit mode
6.2 years ago
h.mon
35k
See the following examples of contig headers from a Tadpole assembly:
>contig_7,length=542,cov=6.2,min=4,max=8,gc=0.555,left=DEAD_END,right=DEAD_END
>contig_4,length=506,cov=1821.6,min=1396,max=2445,gc=0.571,left=B_BRANCH_2.7,right=B_BRANCH_1.1
>contig_1,length=4317,cov=42.3,min=21,max=89,gc=0.512,left=F_BRANCH_2.7,right=F_BRANCH_1.7
What is the meaning of the left=
and right=
fields of the header?
My understanding is this is a record of the reason for the interruption of the kmer graph for the left and right side of the contigs: DEAD_END
(no more extensions possible on kmer graph) and _BRANCH_
(not possible to unambiguously extend the kmer graph), but what the F_
and B_
prefixes, and numbers suffixes, for the _BRANCH_
? Are they indicating possible linkage paths?
I am not sure, but perhaps you need to contact Brian Bushnell (the developer) directly. But I am curious if you are assembling a haploid (okay to use tadpole) or diploid and above ploidy genome (not recommended to tadpole as it doesn't yet [as far as I know] have the ability to deal with sequence heterozygosity).
As there are many BBTools users here, I thought maybe someone would know (or perhaps have an educated guess) about the header descriptions. I will try opening an issue at the BBTools SourceForge page. Brian Bushnell was quite active here and at SeqAnswers until a few months ago, but recently stoped posting. His tools have a very good and throughout documentation, so I like them very much, but these header descriptions are missing.
I am assembling a haploid cosmid. It has inverted repeats which threw away every assembler I tried so far.
I wouldn't say it is not okay to use tadpole for diploid genomes: one would get a highly fragmented and duplicated assembly, with very few or no errors. Most of the time, we want the most contiguous haploid representation of a genome, but there may be cases where it is fine, or maybe even needed, to have a highly redundant assembly.
Yes, I agree. That is why I said not recommended, but as you pointed out, there are indeed good uses for it for diploid or higher ploidy. It is super fast and resource efficient.