R And Y Characters In Allpathslg Assembly...Why?
1
0
Entering edit mode
12.9 years ago

I recently ran a whole-genome assembly with AllPathsLG. Several of the scaffolds in the resulting assembly contain R and Y characters in the sequence. These are the IUPAC symbols for purines and pyrimidines, respectively, but I have no idea why they would show up in the assembly. It doesn't look like the AllPathsLG manual sheds any light on the issue.

Do these symbols have an alternative meaning in AllPathsLG, or am I missing something?

assembly • 2.3k views
ADD COMMENT
0
Entering edit mode

ALLPATHS-LG preserves heterozygotes as much as possible. Those are hets it believes to be true in the genome. This is a welcomed feature.

ADD REPLY
0
Entering edit mode

The documentation did mention that it attempts to preserve as much ambiguity as possible, but I thought this was referring to unresolved repeat regions, homopolymers, etc. I agree that preserving this information is welcome, it just complicates downstream analysis with software that requires simple alphabets.

ADD REPLY
0
Entering edit mode

Just convert R randomly to A or G. The majority of assemblers effectively do this.

ADD REPLY
0
Entering edit mode

Thanks! You're welcome to get the accepted answer if you add your comment as an answer.

ADD REPLY
2
Entering edit mode
12.9 years ago
lh3 33k

Okay, put my comments into an answer such that this question can have an accepted answer.

ALLPATHS-LG preserves heterozygotes as much as possible. Those are hets it believes to be true in the genome. If you are concerned with those ambiguous bases, you may convert, for example, "R" randomly to "A" or "G". The vast majority of NGS assemblers effectively do this when they pop bubbles.

Keeping heterozygotes is a very useful feature for a diploid genome. Although many other assemblers claim to keep the bubble information, many of them do not accurately distinguish a het from a sequencing error.

ADD COMMENT

Login before adding your answer.

Traffic: 2654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6