For our Ribo-seq data set I tried the star aligner but was able to map only a very small fraction of the reads (<1% in some samples), while most of the reads a discarded for being too short.
What does it means for STAR? Where can I manage the minimum read length?
The ribo-seq data was first trimmed using cutadapt based on the Truseq adapter sequence. I than mapped it to the rRNA and kept only the unmapped reads to be later mapped against the transcriptome using STAR.
How can I increase the number of mapped reads?
Thanks
Assa
$ cat GSM3152885/GSM3152885.Log.final.out
Started job on | Jan 19 15:04:37
Started mapping on | Jan 19 15:04:39
Finished on | Jan 19 15:08:50
Mapping speed, Million of reads per hour | 393.02
Number of input reads | 27402469
Average input read length | 40
UNIQUE READS:
Uniquely mapped reads number | 173659
Uniquely mapped reads % | 0.63%
Average mapped length | 28.52
Number of splices: Total | 501
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 500
Number of splices: GC/AG | 0
Number of splices: AT/AC | 0
Number of splices: Non-canonical | 1
Mismatch rate per base, % | 2.79%
Deletion rate per base | 0.00%
Deletion average length | 1.12
Insertion rate per base | 0.00%
Insertion average length | 1.00
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 281767
% of reads mapped to multiple loci | 1.03%
Number of reads mapped to too many loci | 12974
% of reads mapped to too many loci | 0.05%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 26934058
% of reads unmapped: too short | 98.29%
Number of reads unmapped: other | 11
% of reads unmapped: other | 0.00%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Looks like the reads length is 40 bp. Have you tried ungapped mapping e.g.
bowtie v.1.x
.