Question

STAR mapping results

1

Entering edit mode

5.2 years ago

mxlsherry1992 ▴ 80

Hi,

I used STAR to map my RNA seq data to the genome, here is the output file for the mapping rate, but I had a hard time to understand it..

Started job on |    Aug 09 16:35:19
                             Started mapping on |   Aug 09 16:35:48
                                    Finished on |   Aug 09 17:28:18
       Mapping speed, Million of reads per hour |   21.41

                          Number of input reads |   18734806
                      Average input read length |   249
                                    UNIQUE READS:
                   Uniquely mapped reads number |   10373363
                        Uniquely mapped reads % |   55.37%
                          Average mapped length |   242.88
                       Number of splices: Total |   8480091
            Number of splices: Annotated (sjdb) |   7969501
                       Number of splices: GT/AG |   8357360
                       Number of splices: GC/AG |   77588
                       Number of splices: AT/AC |   6734
               Number of splices: Non-canonical |   38409
                      Mismatch rate per base, % |   0.28%
                         Deletion rate per base |   0.03%
                        Deletion average length |   2.76
                        Insertion rate per base |   0.02%
                       Insertion average length |   2.49
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   1489776
             % of reads mapped to multiple loci |   7.95%
        Number of reads mapped to too many loci |   6179
             % of reads mapped to too many loci |   0.03%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   6856355
                 % of reads unmapped: too short |   36.60%
                Number of reads unmapped: other |   9133
                     % of reads unmapped: other |   0.05%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

I just want to know if the mapping rate is

Uniquely mapped reads % (55.37%)

? Because when I used Hisat2, the mapping rate is the add for several things....So in here, if I use STAR..if I need to add some of the number?

RNA-Seq • 9.5k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.2 years ago by mxlsherry1992 ▴ 80

1

Entering edit mode

Depends on what you're looking for.

If you are interested in only the uniquely mapped reads (== good for most use cases, eg. expression analysis) then the number is what it is (rather on the low end judging with the info we have).

If, on the other hand, you want an idea of how many of the reads mapped in total you will need to add the uniquely with the multi-mapped ones to come to the final number.

Different aligners will come up with different number for the amount of aligned reads, but they should all be in somewhat the same range.

ADD REPLY • link 5.2 years ago by lieven.sterck 15k

0

Entering edit mode

Got it !! thank you!!! I will use the add of Uniquely mapped reads, % of reads mapped to multiple loci , % of reads mapped to too many loci

ADD REPLY • link 5.2 years ago by mxlsherry1992 ▴ 80

0

Entering edit mode

Seeing this high amount of unmapped reads, are you expecting this? You can get this high amounts of ungapped reads, e.g. if you forgot to trim the reads or when the paired end read files are not sorted right, in case this is just normal sequencing data of for examples human cells.

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Hi, thanks for reply, I already trimmed it. I thought that STAR will have relatively low mapped rate compared to Hisat2, stringtie...?

ADD REPLY • link 5.2 years ago by mxlsherry1992 ▴ 80

0

Entering edit mode

I never worked with Hisat2, but I would guess they should be around the same. Out of curiosity, did you check for rRNA content in the samples with for example the tool sortMeRNA? Analysing relatively old data, I came across rRNA contents of around 20-40% of total reads after mRNA enrichment. So that's theoretically possible.

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Why sort of trimming did you perform? Don't use hard cutoffs it will alter downstream analysis: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0956-2#Fig5

IMO, you should not do any trimming on RNAseq data, STAR will handle low quality bases. You can do very soft trimming the ends. I think this is your problem, you've trimmed too much so STAR has ignored those reads. You could alter the parameters in STAR to accept small reads but I Think trimming is the issue.

ADD REPLY • link 5.1 years ago by Mark ★ 1.6k

0

Entering edit mode

hi here is the script I used, if that is the error...

java -jar /tools/trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 1 -phred33 /home/Chan9-1_R1_001.fastq /home/Chan9-1_R2_001.fastq /home/clean_data/Chan9-1_R1_left_paired_trimmed.fq /home/Chan9-1_R1_left_unpaired_trimmed.fq /home/Chan9-1_R2_right_paired_trimmed.fq /home/Chan9-1_R2_right_unpaired_trimmed.fq ILLUMINACLIP:/tools/trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:36

ADD REPLY • link 5.0 years ago by mxlsherry1992 ▴ 80

0

Entering edit mode

This high amount of "too short" reads could also be the result, when you have paired end reads which are not sorted correctly.

ADD REPLY • link 5.1 years ago by caggtaagtat ★ 1.9k