Question

the lower unique mapping rate after Trimming

0

Entering edit mode

6.1 years ago

lilingjoyo ▴ 40

Hallo everone,

I came into some tricky problems with processing total stranded RNA-seq data. I used trimmomatic tool to trim adapter sequence and low quality reads. Then, used STAR to map the clean reads to GRCh38. But the unique mapping rate is quite poor than my colleague's who just use the raw data for STAR mapping. He thinks the reads have been trimmed by illumina system, there is no need to trim again. All parameters are same between us, all are default. This is not reasonable. Doesn't it should make a better mapping result after trimming? I tested several samples, they all make same results. That's very tricky. Here is an mapping result of a sample with or without trimming.

Without trimming:

                             Started job on |   Nov 28 13:28:58
                         Started mapping on |   Nov 28 13:29:25
                                Finished on |   Nov 28 14:42:17
   Mapping speed, Million of reads per hour |   122.99

                      Number of input reads |   149364482
                  Average input read length |   276
                                UNIQUE READS:
               Uniquely mapped reads number |   115685706
                    Uniquely mapped reads % |   77.45%
                      Average mapped length |   276.34
                   Number of splices: Total |   43804823
        Number of splices: Annotated (sjdb) |   42830914
                   Number of splices: GT/AG |   43313281
                   Number of splices: GC/AG |   252366
                   Number of splices: AT/AC |   27822
           Number of splices: Non-canonical |   211354
                  Mismatch rate per base, % |   0.34%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.92
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.66
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   31227061
         % of reads mapped to multiple loci |   20.91%
    Number of reads mapped to too many loci |   55837
         % of reads mapped to too many loci |   0.04%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   1.53%
                 % of reads unmapped: other |   0.07%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

<h5>with trimming</h5>

                              Started job on |  Nov 29 10:41:38
                         Started mapping on |   Nov 29 10:45:48
                                Finished on |   Nov 29 11:19:15
   Mapping speed, Million of reads per hour |   258.01

                      Number of input reads |   143843180
                  Average input read length |   239
                                UNIQUE READS:
               Uniquely mapped reads number |   73720415
                    Uniquely mapped reads % |   51.25%
                      Average mapped length |   244.17
                   Number of splices: Total |   24431193
        Number of splices: Annotated (sjdb) |   23918159
                   Number of splices: GT/AG |   24164196
                   Number of splices: GC/AG |   117470
                   Number of splices: AT/AC |   13993
           Number of splices: Non-canonical |   135534
                  Mismatch rate per base, % |   0.34%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.94
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.57
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   11930748
         % of reads mapped to multiple loci |   8.29%
    Number of reads mapped to too many loci |   42329
         % of reads mapped to too many loci |   0.03%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |   0.00%
             % of reads unmapped: too short |   40.37%
                 % of reads unmapped: other |   0.06%
                              CHIMERIC READS:
                   Number of chimeric reads |   0
                        % of chimeric reads |   0.00%

RNA-Seq STAR trimmomatic • 2.5k views

ADD COMMENT • link 6.1 years ago by lilingjoyo ▴ 40

0

Entering edit mode

If the mapping result of trimmed data is worse, is it necessary to do trimming?

ADD REPLY • link 6.1 years ago by lilingjoyo ▴ 40

1

Entering edit mode

It is strictly not necessary to do trimming since STAR should take care of any extraneous sequence by soft-clipping.

ADD REPLY • link 6.1 years ago by GenoMax 148k

0

Entering edit mode

Without trimming:

  MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   31227061
         % of reads mapped to multiple loci |   20.91%
    Number of reads mapped to too many loci |   55837
         % of reads mapped to too many loci |   0.04%

With trimming:

MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   11930748
         % of reads mapped to multiple loci |   8.29%
    Number of reads mapped to too many loci |   42329
         % of reads mapped to too many loci |   0.03%

Basically you have 20% reads you can't use without trimming as compared to 8.2% in with trimming when you do counting.

ADD REPLY • link 6.1 years ago by GenoMax 148k

0

Entering edit mode

Thanks genomax. Can you make it more clear wether or not I need to do trimming based on current condition. The % of reads mapped to multiple loci is higher, but the Uniquely mapped reads % is higher too in sample without trimming. It's unreasonable.

ADD REPLY • link 6.1 years ago by lilingjoyo ▴ 40

0

Entering edit mode

Do you know if your data needs to be trimmed (i.e. has some extraneous sequence)? If that is not the case you may be adding some bias.

ADD REPLY • link 6.1 years ago by GenoMax 148k

0

Entering edit mode

Hi genomax, thanks. That's also what I thought. I think unique mapping rate is top priority for accessing the data quality. If this is higher without trimming, then I just pass the trimming process. If things go wring during mapping, then I will trace back and check it.

ADD REPLY • link 6.1 years ago by lilingjoyo ▴ 40