Hello,
I've aligned single-cell RNA seq to mm10 using STAR. I only get about 13% uniquely mapped reads, with 79% being too short.
I get the following output:
Started job on | Mar 09 14:04:53
Started mapping on | Mar 09 14:07:01
Finished on | Mar 09 14:23:11
Mapping speed, Million of reads per hour | 67.13
Number of input reads | 18088226
Average input read length | 47
UNIQUE READS:
Uniquely mapped reads number | 2298713
Uniquely mapped reads % | 12.71%
Average mapped length | 44.12
Number of splices: Total | 54580
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 51443
Number of splices: GC/AG | 601
Number of splices: AT/AC | 27
Number of splices: Non-canonical | 2509
Mismatch rate per base, % | 6.80%
Deletion rate per base | 0.02%
Deletion average length | 1.51
Insertion rate per base | 0.02%
Insertion average length | 1.40
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1405637
% of reads mapped to multiple loci | 7.77%
Number of reads mapped to too many loci | 95119
% of reads mapped to too many loci | 0.53%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 78.96%
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
These two pieces of information appear to contradict each other:
1) 78% of reads are too short
2) Average input read length 47 nucleotides.
I looked at the fastq file and there aren't many short reads. I don't understand what went wrong.
What explains the poor alignment?
Did you check with fastQC how your read length distribution is?
![enter image description here][1]
Hi b.nota, [1]: https://postimg.org/image/5xdvullcz/ but this just looks like a zoom out, yes some reads are too short, but this only be a fraction of a percent.
You can change the minimum read length manually, hopefully this helps for you. See previous post:
STAR Aligner minimum read-length