Bowtie Outputs Hits That Are Shorter Than The Reads Aligned
1
1
Entering edit mode
11.0 years ago

I have a file that looks like

>rno-miR-322 MIMAT0001619 Rattus norvegicus miR-322

CAGCAGCAAUUCAUGUUUUGGA

>rno-miR-322* MIMAT0000547 Rattus norvegicus miR-322*

AAACAUGAAGCGCUGCAACA

The reads are 22 and 20 nucleotides in length, respectively.

Aligning this file to the rn4 genome with the command

bowtie -f rn4 test_file

produces the output:

rno-miR-322 MIMAT0001619 Rattus norvegicus miR-322 + chrX 22228104 CAGCAGCAACAGGGA IIIIIIIIIIIIIII 11
rno-miR-322* MIMAT0000547 Rattus norvegicus miR-322* - chr8 27772395 TGTTGCGCGCTTCTGTTT IIIIIIIIIIIIIIIIII 0 12:T>C,16:T>G

Observe that the length of the reads aligned are too short, 15 and 18 characters. Strange thing is, when I try to align the reads individually with the -c option it works:

bowtie -f rn4 -c CAGCAGCAAUUCAUGUUUUGGA

produces

0 - chrX 140000211 TCCAAAACATGAATTGCTGCTG IIIIIIIIIIIIIIIIIIIIII 0

And

bowtie -f rn4 -c AAACAUGAAGCGCUGCAACA

gives

0 - chrX 140000175 TGTTGCAGCGCTTCATGTTT IIIIIIIIIIIIIIIIIIII 0

What might possibly be wrong?

bowtie • 2.4k views
ADD COMMENT
2
Entering edit mode

Have you noticed that it removes only the Us? I suspect that specifying a uridine isn't supported in an input fasta file. I could check the source code, but if you just check a few more reads, you could more quickly confirm that.

ADD REPLY
0
Entering edit mode

That's one of the first things I wondered about but the person I'm helping said that most such software supported "U"s. Should have checked it anyways. Edit: Yes, this was it. Please post as answer and I'll accept and upvote.

ADD REPLY
1
Entering edit mode

I'm not sure where the person you're working with came by that idea. bowtie and other similar tools are meant for high-throughput sequencing data, which should rarely, if ever, have a U in the sequence (I've never seen one at least). Is the person by chance a bench scientist? :)

ADD REPLY
1
Entering edit mode

Yes, it is I that should have known better. Thanks!

ADD REPLY
0
Entering edit mode

BBMap supports U in the reference :)

It supports U in reads, also, if you add the "utot" flag (U to T).

ADD REPLY
0
Entering edit mode
11.0 years ago

Here is a python script that evades the problem by using the -c option. If anyone knows what or just have some guesses about what is happening in the question above please do help.

from __future__ import print_function
import sys
import subprocess

list_of_reads_to_align = []

with open(sys.argv[1]) as input_file:
    for line in input_file:
        if not line.startswith(">"):
            list_of_reads_to_align.append(line.strip())

args_noheader = 'bowtie -f --sam-nohead --best --strata -m 10 -S rn4 -c {0}'
args = 'bowtie -f --best --strata -m 10 -S rn4 -c {0}'

for counter, read in enumerate(list_of_reads_to_align):
    if counter != 0:
        cmd = subprocess.Popen(args_noheader.format(read), shell = True, stdout = subprocess.PIPE, close_fds=True, stderr = subprocess.PIPE)
    else:
        cmd = subprocess.Popen(args.format(read), shell = True, stdout = subprocess.PIPE, close_fds=True, stderr = subprocess.PIPE)

    for line in cmd.stdout:
        print(line, end="")
ADD COMMENT

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6