Hi there
I have a set of contigs and am trying to identify unique sequences in them and ideally the longest possible sequences. I have already identified and extracted the unique kmers from these contigs (length=32) and i would like to re-assemble them into the original sequences they came from - in order to identify the longest possible genomic regions Example:
##kmers extracted
>p4598
GCAGCGAAATAACGCGATATAACATGCTAAGG
>p4599
CAGCGAAATAACGCGATATAACATGCTAAGGA
>p4600
AGCGAAATAACGCGATATAACATGCTAAGGAA
>p4601
GCGAAATAACGCGATATAACATGCTAAGGAAG
>p4602
CGAAATAACGCGATATAACATGCTAAGGAAGG
>p4603
GAAATAACGCGATATAACATGCTAAGGAAGGT
>p4604
AAATAACGCGATATAACATGCTAAGGAAGGTG
>p4605
AATAACGCGATATAACATGCTAAGGAAGGTGC
>p4606
ATAACGCGATATAACATGCTAAGGAAGGTGCG
>p4607
TAACGCGATATAACATGCTAAGGAAGGTGCGA
>p4608
AACGCGATATAACATGCTAAGGAAGGTGCGAA
>p4609
ACGCGATATAACATGCTAAGGAAGGTGCGAAT
>p4610
CGCGATATAACATGCTAAGGAAGGTGCGAATA
>p4611
GCGATATAACATGCTAAGGAAGGTGCGAATAA
>p4612
CGATATAACATGCTAAGGAAGGTGCGAATAAG
>p4613
GATATAACATGCTAAGGAAGGTGCGAATAAGC
>p4614
ATATAACATGCTAAGGAAGGTGCGAATAAGCG
>p4615
TATAACATGCTAAGGAAGGTGCGAATAAGCGG
>p4616
ATAACATGCTAAGGAAGGTGCGAATAAGCGGG
>p4617
TAACATGCTAAGGAAGGTGCGAATAAGCGGGG
>p4618
AACATGCTAAGGAAGGTGCGAATAAGCGGGGA
>p4619
ACATGCTAAGGAAGGTGCGAATAAGCGGGGAA
>p4620
CATGCTAAGGAAGGTGCGAATAAGCGGGGAAA
>p4621
ATGCTAAGGAAGGTGCGAATAAGCGGGGAAAT
>p4622
TGCTAAGGAAGGTGCGAATAAGCGGGGAAATT
these kmers can be assembled into
>merged
GCAGCGAAATAACGCGATATAACATGCTAAGGAAGGTGCGAATAAGCGGGGAAATT
but how can i do that? If i try to align them, i can get the consensus sequence but i cannot do it one by one because i have thousands of files like that. I tried to reassemble them with megahit using "-r kmers.fasta --min-contig-len 50" but that only works well only in some cases (i suspect when there are A LOT of kmers of the same regions).
Are there any easy ways to achieve that? thanks
that looks like exactly what I want but I cannot get the map to work! Any ideas why?
I am using the latest 0.19.1 version but map doesn't seem to exist in the functions list
It seems like I forgot to release the new version. Please replace
unikmer map
withunikmer uniqs
.--- update ----
new version released.
YES, thats excellent!