Mapping To Genome With Ambiguous Reference Characters (R,Y,K,M,S,W Etc.)
2
5
Entering edit mode
14.1 years ago
Rm 8.3k

I am mapping illumina reads using bowtie/bwa to a reference genome with ambiguous reference characters (N, -, R, Y,K,M,S,W etc.).

For example: At a particular location where ambiguous reference character exists (R), I want read with either A or G to be matched as perfect match.

In the below case using bowtie, read is not able to match to the reference.

Reference: ATTCAAGCCCMGAGCGTMTATAAKGGAAGCTKCGCGTGTGTATGCATCAATTGGCAAGATGTTGTG Read:
ATTCAAGCCCAGAGCGTCTATAATGGAAGCTTCGCGTGTGTATGCATCAATTGGCAAGATGTTGTG

Can you suggest options to set within bowtie/bwa or suggest other alignment tool where I can acheive this.

I learnt that "Alignments involving one or more ambiguous reference characters (N, -, R, Y, etc.) are considered invalid by Bowtie."

mapping bowtie bwa reference • 10k views
ADD COMMENT
0
Entering edit mode

Strictly speaking, bowtie, as well as bwa, takes an ambiguous base as a random A/C/G/T. They regard a match to an ambiguous base as a mismatch after mapping.

ADD REPLY
13
Entering edit mode
14.1 years ago
lh3 33k

GSNAP, Mosaik and novoalign.

ADD COMMENT
1
Entering edit mode

If your intention is to reduce the reference bias, I am sure novoalign and gsnap implement that correctly. As mosaik is not published (novoalign is not published but I have discussed this with its developer), I do not know if it does that correctly. Note that claiming a feature does not necessarily mean the feature is implemented correctly.

ADD REPLY
0
Entering edit mode

Thanks @lh3, I accept your answer; but I am trying SOAPaligner/soap2

ADD REPLY
0
Entering edit mode

I do not know if soap2 accepts ambiguous bases. In general, soap2 is great, but it does not natively support SAM output, which might cause problems for downstream analyses. If you have to use one, I would recommend novoalign.

ADD REPLY
0
Entering edit mode

By design, it is hard for a BWT based aligner to work with ambiguous bases in the expected way.

ADD REPLY
0
Entering edit mode

I am not happy with SOAPaliner...currently trying GSNAP....

ADD REPLY
0
Entering edit mode

@Lh3; finally after looking to all three you suggested, going ahead with Mosaik. Thanks!

ADD REPLY
1
Entering edit mode
14.1 years ago
Andreas ★ 2.5k

RazerS accepts ambiguity characters as well: http://www.seqan.de/projects/razers.html

Andreas

Edit: Does not seem to be fully correct. See comments. Only N is supported.

ADD COMMENT
0
Entering edit mode

thanks @Andreas, BTW do you know which options to use in razer to make use of ambiguous bases in reference sequence

ADD REPLY
0
Entering edit mode

In RazerS, I did not see any parameter to support ambiguous bases except "-mN" which only allows "N" to match to any base (ATGC).

ADD REPLY
0
Entering edit mode

I enquired with the Razers authors they say it doesnot support the Ambiguous bases except "N"

ADD REPLY
0
Entering edit mode

Ok. Thanks for your investigation and clarification. Edited my answer.

ADD REPLY

Login before adding your answer.

Traffic: 1639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6