Question

Align raw Nanopore reads (amplicon, long PCR)

1

Entering edit mode

16 months ago

garden_giessen ▴ 130

long PCR, 6000 bp, long indels (>100bp), multicopy gene = multiple amplicons from same PCR but only differ in indel meaning minimal substitutions: variance~1%, 2000 reads per sample, homopolymers (12bp) and tandem repeats (up to 55 fold, length 12-250bp), no reference available

I want to de-noise my amplicons and generate one consensus sequence for each of the included variants of my target amplicon.

I tried MAFFT with the fastest setting but it is still too slow and does not make use of multiple cores (fastest setting is a progressive alignment). I tried supposedly fast aligners like MAGUS but it never finished under one hour per sample. I tried flye and it cuts some of the tandem repeats, although other aligners do much worse. I tried the Geneious assembler meant for Sanger data and for small read numbers it works quite well, but this data is too much for it. I tried Amplicon_sorter but it is not sensitive enough to catch the variants. I tried halign but it does not properly understand indels and oversplits the alignment. I tried k-mer clustering but it does not work because the variants are different in the number of repeats and this is not reflected by k-mer comparisons (2 sequences with different repeat counts generate the same k-mers).

Any ideas? Most assemblers do not understand that there is no correction of repeats needed, my data is all complete, i.e. start to finish, and therefore they generate a lot of errors. Also I do not want to have my reads extended for the same reason, yet magically some assemblers find a way to extend on my reads.

This is what the data looks like after a MAFFT alignment and elimination of all nucleotides with less than 5% frequency per col: enter image description here

Nanopore • 1.2k views

ADD COMMENT • link 16 months ago by garden_giessen ▴ 130

score 0 · Answer 1 · 2023-08-16

0

Entering edit mode

16 months ago

cfos4698 ★ 1.1k

Have you tried amplicon_sorter?

ADD COMMENT • link 16 months ago by cfos4698 ★ 1.1k

0

Entering edit mode

Yes. Does not work. Requires 80% distance between amplicons. It is meant to identify different genes not variants of the same gene.

ADD REPLY • link 16 months ago by garden_giessen ▴ 130