Align raw Nanopore reads (amplicon, long PCR)
1
1
Entering edit mode
15 months ago

long PCR, 6000 bp, long indels (>100bp), multicopy gene = multiple amplicons from same PCR but only differ in indel meaning minimal substitutions: variance~1%, 2000 reads per sample, homopolymers (12bp) and tandem repeats (up to 55 fold, length 12-250bp), no reference available

I want to de-noise my amplicons and generate one consensus sequence for each of the included variants of my target amplicon.

I tried MAFFT with the fastest setting but it is still too slow and does not make use of multiple cores (fastest setting is a progressive alignment). I tried supposedly fast aligners like MAGUS but it never finished under one hour per sample. I tried flye and it cuts some of the tandem repeats, although other aligners do much worse. I tried the Geneious assembler meant for Sanger data and for small read numbers it works quite well, but this data is too much for it. I tried Amplicon_sorter but it is not sensitive enough to catch the variants. I tried halign but it does not properly understand indels and oversplits the alignment. I tried k-mer clustering but it does not work because the variants are different in the number of repeats and this is not reflected by k-mer comparisons (2 sequences with different repeat counts generate the same k-mers).

Any ideas? Most assemblers do not understand that there is no correction of repeats needed, my data is all complete, i.e. start to finish, and therefore they generate a lot of errors. Also I do not want to have my reads extended for the same reason, yet magically some assemblers find a way to extend on my reads.

This is what the data looks like after a MAFFT alignment and elimination of all nucleotides with less than 5% frequency per col: enter image description here

Nanopore • 1.1k views
ADD COMMENT
0
Entering edit mode
15 months ago
cfos4698 ★ 1.1k

Have you tried amplicon_sorter?

ADD COMMENT
0
Entering edit mode

Yes. Does not work. Requires 80% distance between amplicons. It is meant to identify different genes not variants of the same gene.

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6