Trying to understand STAR fastqLog.final.out File
0
0
Entering edit mode
11 months ago
Daniel ▴ 30

Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star's log file is correct. I do not have extensive bioinformatics/computational experience, so it's been a bit difficult trying to understand how to proceed (the guides online are pretty limited and assume you have extensive domain knowledge.

Ultimately, this is a poor alignment because ~20% of reads are unmapped. Additionally, only 8% of reads are aligned to a single location, with there being too many multi-mapped reads.

The genome index was built with a gtf excluding any rRNA line.

My questions are: Is my interpretation of the results correct? And additionally, are there any other factors that could contribute to the poor results?

for file in $(ls /home/drebibo/Project/orf/siEtAl/rawData/*.fastq); do
        nameLong=$(basename $file)
        name="${nameLong%%.*}"

    STAR --runThreadN 25 \
         --genomeDir ./starIndexNorRNA/ \
         --readFilesIn $file \
         --outFileNamePrefix aligned_rRNA_removed/$name \
         --clip3pAdapterSeq CTGTAGGCACCATCAAT \
         --clip3pAdapterMMp 0.1 \
         --outSAMtype BAM SortedByCoordinate

done

                             Started job on |       Dec 19 09:13:08
                         Started mapping on |       Dec 19 09:13:44
                                Finished on |       Dec 19 09:15:06
   Mapping speed, Million of reads per hour |       1365.26

                      Number of input reads |       31097531
                  Average input read length |       29
                                UNIQUE READS:
               Uniquely mapped reads number |       2444902
                    Uniquely mapped reads % |       7.86%
                      Average mapped length |       26.00
                   Number of splices: Total |       146725
        Number of splices: Annotated (sjdb) |       82861
                   Number of splices: GT/AG |       142435
                   Number of splices: GC/AG |       2574
                   Number of splices: AT/AC |       127
           Number of splices: Non-canonical |       1589
                  Mismatch rate per base, % |       0.88%
                     Deletion rate per base |       0.02%
                    Deletion average length |       1.04
                    Insertion rate per base |       0.00%
                   Insertion average length |       1.05
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       13476286
         % of reads mapped to multiple loci |       43.34%
    Number of reads mapped to too many loci |       8369644
         % of reads mapped to too many loci |       26.91%
                              UNMAPPED READS:
  Number of reads unmapped: too many mismatches |       0
       % of reads unmapped: too many mismatches |       0.00%
            Number of reads unmapped: too short |       6015192
                 % of reads unmapped: too short |       19.34%
                Number of reads unmapped: other |       791507
                     % of reads unmapped: other |       2.55%
                                  CHIMERIC READS:
                       Number of chimeric reads |       0
                            % of chimeric reads |       0.00%
star alignment ribo-seq • 395 views
ADD COMMENT

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6