I have a file that looks like
>rno-miR-322 MIMAT0001619 Rattus norvegicus miR-322
CAGCAGCAAUUCAUGUUUUGGA
>rno-miR-322* MIMAT0000547 Rattus norvegicus miR-322*
AAACAUGAAGCGCUGCAACA
The reads are 22 and 20 nucleotides in length, respectively.
Aligning this file to the rn4 genome with the command
bowtie -f rn4 test_file
produces the output:
rno-miR-322 MIMAT0001619 Rattus norvegicus miR-322 + chrX 22228104 CAGCAGCAACAGGGA IIIIIIIIIIIIIII 11
rno-miR-322* MIMAT0000547 Rattus norvegicus miR-322* - chr8 27772395 TGTTGCGCGCTTCTGTTT IIIIIIIIIIIIIIIIII 0 12:T>C,16:T>G
Observe that the length of the reads aligned are too short, 15 and 18 characters. Strange thing is, when I try to align the reads individually with the -c option it works:
bowtie -f rn4 -c CAGCAGCAAUUCAUGUUUUGGA
produces
0 - chrX 140000211 TCCAAAACATGAATTGCTGCTG IIIIIIIIIIIIIIIIIIIIII 0
And
bowtie -f rn4 -c AAACAUGAAGCGCUGCAACA
gives
0 - chrX 140000175 TGTTGCAGCGCTTCATGTTT IIIIIIIIIIIIIIIIIIII 0
What might possibly be wrong?
Have you noticed that it removes only the Us? I suspect that specifying a uridine isn't supported in an input fasta file. I could check the source code, but if you just check a few more reads, you could more quickly confirm that.
That's one of the first things I wondered about but the person I'm helping said that most such software supported "U"s. Should have checked it anyways. Edit: Yes, this was it. Please post as answer and I'll accept and upvote.
I'm not sure where the person you're working with came by that idea. bowtie and other similar tools are meant for high-throughput sequencing data, which should rarely, if ever, have a U in the sequence (I've never seen one at least). Is the person by chance a bench scientist? :)
Yes, it is I that should have known better. Thanks!
BBMap supports U in the reference :)
It supports U in reads, also, if you add the "utot" flag (U to T).