Hello,
I have some problems with bowtie these days and I don't get it.
I am working on smallRNA-seq, to identify different small RNA species, notably miRNAs. My animal organism is not a model species so I don't have any reference available for it. I decided to use 3 close species - that are referenced - to identify my reads. For miRNAs, which are well conserved, it is a known practice that works well.
So I built my miRNA index with a multifasta, containing all the references from the 3 species, and surprinsingly Bowtie doesn't find a lot of miRNA reads, it is like 12% of success, even with mismatches.
When I go through the unaligned reads, I am intrigued by a sequence that is very abundant (like 7M copies) and with a size that perfectly fit miRNA. So I decided to do a blastn with it and it gives me a genome region that it is known to contain a primary miRNA (which produces mature miRNAs), it is highly conserved through mammals, so this read must be a miRNA.
Going back to my references, I checked if this known primary miRNA is present... and it is, for the 3 species. Meaning I have at least 3 references, with no variation, containing the sequence of my overrepresented read. But Bowtie failed for any reason.
By curiosity, I tried to map my library again, but using only one species at a time for my index and this time Bowtie maps the reads. Giving me like 60% of miRNAs.
How exactly Bowtie's index works ? It seems like the more it has to use, the less it maps.
I used the default parameter to build my indexes. And the same mapping condition in my different tries.
Thank in advance.
I see, thank you. I'll change my code and build several indexes then.