Entering edit mode
6.0 years ago
Hi,all It was the first time for me to map RNA sequence. The data generated from corals .I used STAR to map the sequence to the reference. I used the default parameter but got a terrible result. The final mapping result was
Started job on | Dec 12 16:02:44
Started mapping on | Dec 12 16:03:13
Finished on | Dec 12 16:12:34
Mapping speed, Million of reads per hour | 85.99
Number of input reads | 13400813
Average input read length | 150
Uniquely mapped reads number | 3114
Uniquely mapped reads % | 0.02%
Average mapped length | 124.34
Number of splices: Total | 41
Number of splices: Annotated (sjdb) | 2
Number of splices: GT/AG | 24
Number of splices: GC/AG | 4
Number of splices: AT/AC | 0
Number of splices: Non-canonical | 13
Mismatch rate per base, % | 4.13%
Deletion rate per base | 0.03%
Deletion average length | 1.86
Insertion rate per base | 0.01%
Insertion average length | 1.47
Number of reads mapped to multiple loci | 1505
% of reads mapped to multiple loci | 0.01%
Number of reads mapped to too many loci | 47
% of reads mapped to too many loci | 0.00%
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 99.96%
% of reads unmapped: other | 0.00%
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Is there any idea about the too many unmapped reads? I didn't understand what the reason 'too short' mean. Can somebody explain it?Thanks!
Could you send your data to a pre-processing software like fastqc
What are your reads length ?
What is you command line to align ?
% of reads unmapped: too short
can mean two things with STAR :Too short means too short alignment. Are you sure you use the right reference?
In fact, I have nine types of coral,and I chosen five of these to build the reference index independently.But unfortunately, the results were similar
Can you elaborate on this ? Are you just using short contigs as a reference (please give stats, like using bbmaps stats.sh) ? Are you aligning against a single species ?
Have you tried bwa-mem or minimap2 to check their mapping rates for general info ? Have you ever tried alignments to these references before ?
I've notice that older STAR versions have issues with PE-reads having too much of an overlap.
If you've got PE data, try only R1 first. Otherwise, check your FastQC reports, as Batien mentioned, for adapter-contamination or overrepresented sequences indicating other contaminations.