assemble extracted kmers into longer contigs/sequences
1
0
Entering edit mode
5 months ago
sapuizait ▴ 10

Hi there

I have a set of contigs and am trying to identify unique sequences in them and ideally the longest possible sequences. I have already identified and extracted the unique kmers from these contigs (length=32) and i would like to re-assemble them into the original sequences they came from - in order to identify the longest possible genomic regions Example:

##kmers extracted
>p4598
GCAGCGAAATAACGCGATATAACATGCTAAGG
>p4599
CAGCGAAATAACGCGATATAACATGCTAAGGA
>p4600
AGCGAAATAACGCGATATAACATGCTAAGGAA
>p4601
GCGAAATAACGCGATATAACATGCTAAGGAAG
>p4602
CGAAATAACGCGATATAACATGCTAAGGAAGG
>p4603
GAAATAACGCGATATAACATGCTAAGGAAGGT
>p4604
AAATAACGCGATATAACATGCTAAGGAAGGTG
>p4605
AATAACGCGATATAACATGCTAAGGAAGGTGC
>p4606
ATAACGCGATATAACATGCTAAGGAAGGTGCG
>p4607
TAACGCGATATAACATGCTAAGGAAGGTGCGA
>p4608
AACGCGATATAACATGCTAAGGAAGGTGCGAA
>p4609
ACGCGATATAACATGCTAAGGAAGGTGCGAAT
>p4610
CGCGATATAACATGCTAAGGAAGGTGCGAATA
>p4611
GCGATATAACATGCTAAGGAAGGTGCGAATAA
>p4612
CGATATAACATGCTAAGGAAGGTGCGAATAAG
>p4613
GATATAACATGCTAAGGAAGGTGCGAATAAGC
>p4614
ATATAACATGCTAAGGAAGGTGCGAATAAGCG
>p4615
TATAACATGCTAAGGAAGGTGCGAATAAGCGG
>p4616
ATAACATGCTAAGGAAGGTGCGAATAAGCGGG
>p4617
TAACATGCTAAGGAAGGTGCGAATAAGCGGGG
>p4618
AACATGCTAAGGAAGGTGCGAATAAGCGGGGA
>p4619
ACATGCTAAGGAAGGTGCGAATAAGCGGGGAA
>p4620
CATGCTAAGGAAGGTGCGAATAAGCGGGGAAA
>p4621
ATGCTAAGGAAGGTGCGAATAAGCGGGGAAAT
>p4622
TGCTAAGGAAGGTGCGAATAAGCGGGGAAATT

these kmers can be assembled into

>merged
GCAGCGAAATAACGCGATATAACATGCTAAGGAAGGTGCGAATAAGCGGGGAAATT

but how can i do that? If i try to align them, i can get the consensus sequence but i cannot do it one by one because i have thousands of files like that. I tried to reassemble them with megahit using "-r kmers.fasta --min-contig-len 50" but that only works well only in some cases (i suspect when there are A LOT of kmers of the same regions).

Are there any easy ways to achieve that? thanks

assembly unique kmers • 541 views
ADD COMMENT
2
Entering edit mode
5 months ago

Use unikmer.

# convert kmers into the binary format needed by unikmer 
grep -v '>' kmers.fasta | unikmer dump -s -K -o kmers.unik

$ cat  ref.fasta 
>ref
CAGAGCGCAGCGAAATAACGCGATATAACATGCTAAGGAAGGTGCGAATAAGCGGGGAAATTACGACGA

# mapping back to the reference
$ unikmer map -M -m 32 -g ref.fasta kmers.unik -a 
>ref:7-62
GCAGCGAAATAACGCGATATAACATGCTAAGGAAGGTGCGAATAAGCGGGGAAATT
ADD COMMENT
0
Entering edit mode

that looks like exactly what I want but I cannot get the map to work! Any ideas why?

Error: unknown command "map" for "unikmer"

Run 'unikmer --help' for usage.

unknown command "map" for "unikmer"

ADD REPLY
0
Entering edit mode

I am using the latest 0.19.1 version but map doesn't seem to exist in the functions list

ADD REPLY
1
Entering edit mode

It seems like I forgot to release the new version. Please replace unikmer map with unikmer uniqs.

--- update ----

new version released.

ADD REPLY
0
Entering edit mode

YES, thats excellent!

ADD REPLY

Login before adding your answer.

Traffic: 2135 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6