Question

STAR outputs interpretation

0

Entering edit mode

7.4 years ago

XBria ▴ 90

Hi everyone,

I am working on Rna-seq data. Star is mapping only on chromosome X, data are down-sampled. That is why the uniquely mapped alignments rate is close to 100. (paired-end , length of forward75, reverse,75)

Can I say the mapping is improved using these parameters ?

--outFilterMatchNmin 20 --seedSearchStartLmax 30 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMismatchNoverLmax 9

* I examine 3 samples out of 12, all three tests emerge the exact value of 98.24 rate of uniquely mapped reads !!!!!

Following is the output without these parameters:

 Started job on |   Dec 06 09:32:03
                         Started mapping on |   Dec 06 09:32:11
                                Finished on |   Dec 06 09:33:52
   Mapping speed, Million of reads per hour |   47.10

                      Number of input reads |   1321477
                  Average input read length |   152
                                UNIQUE READS:
               Uniquely mapped reads number |   1281329
                    Uniquely mapped reads % |   96.96%
                      Average mapped length |   151.01
                   Number of splices: Total |   701323
        Number of splices: Annotated (sjdb) |   693399
                   Number of splices: GT/AG |   697684
                   Number of splices: GC/AG |   1968
                   Number of splices: AT/AC |   703
           Number of splices: Non-canonical |   968
                  Mismatch rate per base, % |   0.43%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.53
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.29
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   16029
         % of reads mapped to multiple loci |   1.21%
    Number of reads mapped to too many loci |   286
         % of reads mapped to too many loci |   0.02%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   1.80%
                 % of reads unmapped: other |   0.01%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

and with those parameters:

  Started job on |  Dec 06 09:45:39
                         Started mapping on |   Dec 06 09:45:43
                                Finished on |   Dec 06 09:48:04
   Mapping speed, Million of reads per hour |   33.74

                      Number of input reads |   1321477
                  Average input read length |   152
                                UNIQUE READS:
               Uniquely mapped reads number |   1298240
                    Uniquely mapped reads % |   98.24%
                      Average mapped length |   150.22
                   Number of splices: Total |   705133
        Number of splices: Annotated (sjdb) |   696785
                   Number of splices: GT/AG |   701438
                   Number of splices: GC/AG |   1991
                   Number of splices: AT/AC |   710
           Number of splices: Non-canonical |   994
                  Mismatch rate per base, % |   0.45%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.51
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.29
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   22727
         % of reads mapped to multiple loci |   1.72%
    Number of reads mapped to too many loci |   406
         % of reads mapped to too many loci |   0.03%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   0.00%
                 % of reads unmapped: other |   0.01%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

RNA-Seq • 4.7k views

ADD COMMENT • link 7.4 years ago by XBria ▴ 90

2

Entering edit mode

Okay you increase the fraction of mapped reads, but how do you know those alignments are also "correct"? The percentage aligned, although informative, shouldn't be your only parameter for optimization.

I also wouldn't bother about a difference of only 2%.

ADD REPLY • link 7.4 years ago by WouterDeCoster 48k

2

Entering edit mode

Ref: Improving the mapping rate by aligner parameters

XBria : Creating new threads with variations of the questions from before is not going to help much. At the high end of alignment %, you are splitting hairs. Like @Wouter said above, if those 2 additional % are adding meaningfully to your analysis is questionable. You should be moving on to the actual differential expression analysis.

ADD REPLY • link 7.4 years ago by GenoMax 151k

0

Entering edit mode

Dear Genomax, Could you share a link that clearly explains the trade-offs, I could not find any comprehensive resources on this issues. I need to know more about this. Thanks for understanding me as a beginner in this scope

ADD REPLY • link 7.4 years ago by XBria ▴ 90

0

Entering edit mode

What trade-offs are you referring to?

ADD REPLY • link 7.4 years ago by GenoMax 151k

0

Entering edit mode

I mean the best choices among different optimization results. (it may result in withdrawal of some parameters to be set) How to clearly recognize if we are heading an optimal way ? based on which criteria ?

ADD REPLY • link 7.4 years ago by XBria ▴ 90

0

Entering edit mode

The aim is to improve mapping through parameters setting. How may I know if they are correct ? Is it not showing a good improvements then, is that right ?

ADD REPLY • link 7.4 years ago by XBria ▴ 90