Question

Trying to understand STAR fastqLog.final.out File

0

Entering edit mode

21 months ago

Daniel ▴ 40

Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star's log file is correct. I do not have extensive bioinformatics/computational experience, so it's been a bit difficult trying to understand how to proceed (the guides online are pretty limited and assume you have extensive domain knowledge.

Ultimately, this is a poor alignment because ~20% of reads are unmapped. Additionally, only 8% of reads are aligned to a single location, with there being too many multi-mapped reads.

The genome index was built with a gtf excluding any rRNA line.

My questions are: Is my interpretation of the results correct? And additionally, are there any other factors that could contribute to the poor results?

for file in $(ls /home/drebibo/Project/orf/siEtAl/rawData/*.fastq); do
        nameLong=$(basename $file)
        name="${nameLong%%.*}"

    STAR --runThreadN 25 \
         --genomeDir ./starIndexNorRNA/ \
         --readFilesIn $file \
         --outFileNamePrefix aligned_rRNA_removed/$name \
         --clip3pAdapterSeq CTGTAGGCACCATCAAT \
         --clip3pAdapterMMp 0.1 \
         --outSAMtype BAM SortedByCoordinate

done

                             Started job on |       Dec 19 09:13:08
                         Started mapping on |       Dec 19 09:13:44
                                Finished on |       Dec 19 09:15:06
   Mapping speed, Million of reads per hour |       1365.26

                      Number of input reads |       31097531
                  Average input read length |       29
                                UNIQUE READS:
               Uniquely mapped reads number |       2444902
                    Uniquely mapped reads % |       7.86%
                      Average mapped length |       26.00
                   Number of splices: Total |       146725
        Number of splices: Annotated (sjdb) |       82861
                   Number of splices: GT/AG |       142435
                   Number of splices: GC/AG |       2574
                   Number of splices: AT/AC |       127
           Number of splices: Non-canonical |       1589
                  Mismatch rate per base, % |       0.88%
                     Deletion rate per base |       0.02%
                    Deletion average length |       1.04
                    Insertion rate per base |       0.00%
                   Insertion average length |       1.05
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       13476286
         % of reads mapped to multiple loci |       43.34%
    Number of reads mapped to too many loci |       8369644
         % of reads mapped to too many loci |       26.91%
                              UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       6015192
                 % of reads unmapped: too short |       19.34%
                Number of reads unmapped: other |       791507
                     % of reads unmapped: other |       2.55%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%

star alignment ribo-seq • 632 views

ADD COMMENT • link updated 21 months ago by GenoMax 154k • written 21 months ago by Daniel ▴ 40