Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star's log file is correct. I do not have extensive bioinformatics/computational experience, so it's been a bit difficult trying to understand how to proceed (the guides online are pretty limited and assume you have extensive domain knowledge.
Ultimately, this is a poor alignment because ~20% of reads are unmapped. Additionally, only 8% of reads are aligned to a single location, with there being too many multi-mapped reads.
The genome index was built with a gtf excluding any rRNA line.
My questions are: Is my interpretation of the results correct? And additionally, are there any other factors that could contribute to the poor results?
for file in $(ls /home/drebibo/Project/orf/siEtAl/rawData/*.fastq); do
nameLong=$(basename $file)
name="${nameLong%%.*}"
STAR --runThreadN 25 \
--genomeDir ./starIndexNorRNA/ \
--readFilesIn $file \
--outFileNamePrefix aligned_rRNA_removed/$name \
--clip3pAdapterSeq CTGTAGGCACCATCAAT \
--clip3pAdapterMMp 0.1 \
--outSAMtype BAM SortedByCoordinate
done
Started job on | Dec 19 09:13:08
Started mapping on | Dec 19 09:13:44
Finished on | Dec 19 09:15:06
Mapping speed, Million of reads per hour | 1365.26
Number of input reads | 31097531
Average input read length | 29
UNIQUE READS:
Uniquely mapped reads number | 2444902
Uniquely mapped reads % | 7.86%
Average mapped length | 26.00
Number of splices: Total | 146725
Number of splices: Annotated (sjdb) | 82861
Number of splices: GT/AG | 142435
Number of splices: GC/AG | 2574
Number of splices: AT/AC | 127
Number of splices: Non-canonical | 1589
Mismatch rate per base, % | 0.88%
Deletion rate per base | 0.02%
Deletion average length | 1.04
Insertion rate per base | 0.00%
Insertion average length | 1.05
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 13476286
% of reads mapped to multiple loci | 43.34%
Number of reads mapped to too many loci | 8369644
% of reads mapped to too many loci | 26.91%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 6015192
% of reads unmapped: too short | 19.34%
Number of reads unmapped: other | 791507
% of reads unmapped: other | 2.55%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%