RNA-seq short-read mapping with wildcards (IUPAC codes) using GEM mapper
0
1
Entering edit mode
10.3 years ago
vitor.aguiar ▴ 10

I've been facing a problem with wildcards (IUPAC ambiguity codes) when using GEM mapper to map RNA-seq short reads against a custom reference sequence (index).

As far as I understood by reading the documentation of gem-indexer and gem-mapper, it would be OK if I had N's either in my index or in my reads. However, I made a small example of an index and a paired-end read with N's, and the mapping isn't working as I expected. GEM returns mismatches at the positions in which there is an N, instead of considerig N-[AGCT] as a perfect match.

So, using the example indices and reads below, I get mismatches when trying to map a paired-end read without N's against an index with N's, and vice-versa.

Is there any argument I should be using in order to consider N's as wildcards and return N-[AGCT] as a perfect match?

INDEX with N:

ATGGCCGTCATGGCGCCCCGAACCCTCGTCCTGCTACTCTCGGGGGCTCTGGCCCTGACCCA
GACCTGGGCGGGCTCTCACTCCATGAGGTATTTCTACACCTCCGTGTCCCGGCCCGGCCGCG
GGGAGCCCCGCTTCATCGCCGTGGGCTACGTGGACGACACGCAGTTCGTGCGGTTCGACAGC
GACGCCGCGAGCCAGAGGATGGAGCCGCGGGCGCCGTGGATAGAGCAGGAGGGTCCGGAGTA
TTGGGACGGGGAGACACGGAAAGTGAAGGCCCACTCACAGACTCACCGAGTGGACCTGGGGA
CCCTGCGCGGCTACTACAACCAGAGCGAGGCCGGTTCTCACACCGTCCAGAGGATGTATGGC
TGCGACGTGGGGTCGGACTGGCGCTTCCTCCGCGGGTACCACCAGTACGCCTACGACGGCAA
GGATTACATCGCCCTGAAAGAGGACCTGCGCTCTTGGACCGCGGCGGACATGGCAGCTCAGA
CCACCAAGCACAAGTGGGAGGCGGCCCATGTGGCGGAGCAGTTGAGAGCCTACCTGGAGGGC
ACGTGCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCACGGA
NGCCCCCAAAACGCATATGACTCACCANGCTGTCTCNGACCATGAAGCCACCCTGAGGTGCT
GGGCCCTGAGCT

INDEX without Ns:

ATGGCCGTCATGGCGCCCCGAACCCTCGTCCTGCTACTCTCGGGGGCTCTGGCCCTGACCCA
GACCTGGGCGGGCTCTCACTCCATGAGGTATTTCTACACCTCCGTGTCCCGGCCCGGCCGCG
GGGAGCCCCGCTTCATCGCCGTGGGCTACGTGGACGACACGCAGTTCGTGCGGTTCGACAGC
GACGCCGCGAGCCAGAGGATGGAGCCGCGGGCGCCGTGGATAGAGCAGGAGGGTCCGGAGTA
TTGGGACGGGGAGACACGGAAAGTGAAGGCCCACTCACAGACTCACCGAGTGGACCTGGGGA
CCCTGCGCGGCTACTACAACCAGAGCGAGGCCGGTTCTCACACCGTCCAGAGGATGTATGGC
TGCGACGTGGGGTCGGACTGGCGCTTCCTCCGCGGGTACCACCAGTACGCCTACGACGGCAA
GGATTACATCGCCCTGAAAGAGGACCTGCGCTCTTGGACCGCGGCGGACATGGCAGCTCAGA
CCACCAAGCACAAGTGGGAGGCGGCCCATGTGGCGGAGCAGTTGAGAGCCTACCTGGAGGGC
ACGTGCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCGCACGGA
AGCCCCCAAAACGCATATGACTCACCAAGCTGTCTCAGACCATGAAGCCACCCTGAGGTGCT
GGGCCCTGAGCT

READ 1

@read.1
GAGAGCCTACCTGGAGGGCACGTGCGTGGAGTGGCTCCGCAGATACCTGGAGAACGGGAAGGAGACGCTGCAGCG
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

READ 2 with N

@read.2
AGCTCAGGGCCCAGCACCTCAGGGTGGCTTCATGGTCNGAGACAGCNTGGTGAGTCATATGCGTTTTGGGGGCNT
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

READ 2 without N

@read.2
AGCTCAGGGCCCAGCACCTCAGGGTGGCTTCATGGTCTGAGACAGCTTGGTGAGTCATATGCGTTTTGGGGGCTT
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I'm running gem with the following commands:

GEM Indexer:

gem-indexer -i index_withN.fasta -o index_withN

GEM Mapper

# the paired-end read will not map with this command:
gem-mapper -I index_withN.gem -p -1 read1.fastq -2 read2_withoutN.fastq -q ignore -m 0 -o test

# the paired-end read will map with mismatches at N's with this command:
gem-mapper -I index_withN.gem -p -1 read1.fastq -2 read2_withoutN.fastq -q ignore -m 0 -o test --mismatch-alphabet ACGNT
alignment RNA-Seq GEM • 3.3k views
ADD COMMENT
0
Entering edit mode

it is probably best if you asked the author, it seems like a bug IMO, let us know how it turns out

ADD REPLY
0
Entering edit mode

I already did. I sent an email to GEM and gemTools authors weeks ago, but unfortunately they didn't reply.

ADD REPLY
0
Entering edit mode

Did you ever got this problem solved? Just that I am also looking for an RNA-seq read mapper that would not consider mapped positions to an N reference as a mismatch? Are you aware of any other mappers that would do it correctly perhaps?

ADD REPLY

Login before adding your answer.

Traffic: 1967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6