I'm working with human small RNA-Seq (50 bp, single end). Surprisingly, I'm able to align only tiny fraction of reads (<1% for 3 mismatches, <5% for 6 mismatches). I have tested 2 aligners (bowtie and gem-mapper) and got similar results. Do you have any idea why is that?
Is it possible that you are missing something really simple? What are bowtie's default options for mapping RNA-seq reads on to a reference genome? Are you mapping to a reference genome or exome? You may want to try Tophat which does Bowtie first for short-read alignment but also allows alignment across splice junctions so it can do splice-junction mapping.
@DK: I trimmed reads at first base having quality <20 and discarded reads shorter than 31 bases
@Dan: I'm mapping onto hg18 (bowtie --sam --all -n3 -l21). Optionally hard clipping -5 10 or -3 10
When doing small RNA-seq, the main fraction of your reads is around 22-24 nt in length (the miRNA fraction). Without clipping the adapter sequences, there is no way to map most of the reads of your experiment. I would recommend you to clip the used adapter (not only 10 nt) and then take a look at your length distribution. You should see a peak at 22-24nt... if not, there is something wrong with your experiment.
and the adaptor for the Illumina HiSeq 2000 miRNA protocol is TGGAATTCTCGGGTGCCAAGGAACTCCAGTCAC., when I search in miRBase, there appeared some mature miRNA accessions, then how to explain the sequences are adapters or mature miRNA sequences? And why there are different illumina adaptors like TCGTATGCCGTCTTCTGCTTGT(A: Problems with analysis of small RNAseq data - Adapter trimming)?
Check the size of the inserts. You may be running into adapter sequences. Mapping with last will give you all unique mappings, assuming you got long enough inserts. Also mapping to mirBASE makes more sense when you have 21bp inserts.
If the problem is really because of adapters, then I will recommend MicroRazerS: rapid alignment of small RNA reads.paper. Reads can be of arbitrary length and can contain adapter sequence at the 3' end, you can find rest of the information here.
Have you tried doing a fastqc on the reads? What do the quality look like?
Is it possible that you are missing something really simple? What are bowtie's default options for mapping RNA-seq reads on to a reference genome? Are you mapping to a reference genome or exome? You may want to try Tophat which does Bowtie first for short-read alignment but also allows alignment across splice junctions so it can do splice-junction mapping.
@DK: I trimmed reads at first base having quality <20 and discarded reads shorter than 31 bases @Dan: I'm mapping onto hg18 (bowtie --sam --all -n3 -l21). Optionally hard clipping -5 10 or -3 10
show us a sequence you think should have aligned