Aligners that can handle sequences with Ns
2
0
Entering edit mode
21 months ago
mbramble • 0

Hi, I'm trying to align a mock fastq file that contains sequences with Ns (ambiguous base) to a typical reference. Bowtie2 won't do it, and from my searchers, it seems that others have had trouble finding an aligner that will.
Maybe just sequence searching is the way to go, but it would be easier to use an aligner, because the sequence length is 23, including 5 Ns.

Specifically, I'd like to find an aligner that will align with exact matching to a reference in the following manner and keep all alignments found.

alignment 1:
ATGCNNNNCGA
TACGGGGGGCT

alignment2:
ATGCNNNNCGA
TACGAAAAGCT

etc.

alignment • 839 views
ADD COMMENT
0
Entering edit mode

That example does not seem right since it looks like you are showing an anti-parallel representation like DNA. An aligner will not do something like this.

ADD REPLY
0
Entering edit mode
21 months ago

You could align them as protein sequences with a custom scoring matrix where the cost of mismatch for N is zero.

ADD COMMENT
0
Entering edit mode
21 months ago

How about dropping the N in your reads entirely? Subsequently, parse the alignments and keep only those which have a gap of exactly the desired number of Ns?

Since you say it is a mock fastq file, you can probably easily generate it without Ns. Then align e.g. with minimap2 using the --cs tag. This allows you to identify gaps of the corresponding length and directly extract the reference bases inside the gap? You may even use --cs=long and get the full alignment pattern directly...

ADD COMMENT

Login before adding your answer.

Traffic: 1840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6