Question

comparing mapping statistics on s. pombe alignment (Star vs. tophat)

0

Entering edit mode

6.7 years ago

dho322 • 0

Hi all,

I am hoping to get some insight into what is happening here or any suggestions.

I am aligning total RNA-seq, single-end data to S. pombe with STAR and Tophat but getting two very different uniquely mapping statistics:

STAR:

                      Number of input reads |   20416529
                  Average input read length |   47
                                UNIQUE READS:
               Uniquely mapped reads number |   2622002
                    Uniquely mapped reads % |   12.84%
                      Average mapped length |   48.83

Tophat

    Reads:
    Input     :  20416529
    Mapped   :  19397908 (95.0% of input)
    95.0% overall read mapping rate.

12.84 v 95% is a pretty big difference.

Any ideas?

RNA-Seq pombe star alignment tophat • 1.7k views

ADD COMMENT • link updated 6.7 years ago by Michael 55k • written 6.7 years ago by dho322 • 0

0

Entering edit mode

Can you please post the command lines you used?

ADD REPLY • link 6.7 years ago by Dan D 7.4k

0

Entering edit mode

Because this is total RNA-seq most of your data is likely rRNA reads, which would likely be multi-mapping hence not counted as uniquely mapped by STAR.

Please don't use TopHat for any current projects.

ADD REPLY • link 6.7 years ago by GenoMax 148k

0

Entering edit mode

Hmm, this could be the case with rRNA contamination. Looking back the at library prep, there doesnt seem to be an rRNA depletion step.

ADD REPLY • link 6.7 years ago by dho322 • 0

0

Entering edit mode

If the prep was for total RNAseq then that is expected. If the prep was supposed to be for mRNAseq with ribo-depletion then ..

ADD REPLY • link 6.7 years ago by GenoMax 148k

score 0 · Answer 1 · 2018-04-19

0

Entering edit mode

6.7 years ago

Michael 55k

Besides the low number of uniquely mapping reads which is suspcious, you are confusing

"Uniquely mapped" and "Mapped".

Note the "uniquely" in Star output while tophat gives you the total number of mapped reads. Star does not give you the sum of all mapped reads in the Log.final.out, instead look for

                Uniquely mapped reads number  |     96821769
                     Uniquely mapped reads %  |     92.13%

and

      Number of reads mapped to multiple loci  |     6264908
           % of reads mapped to multiple loci  |     5.96%

The sum of these is the total aligned number/percentage.

In addition, you could include those if you have a filter on the max. number of multimapping locations:

     Number of reads mapped to too many loci  |     62660
               % of reads mapped to too many loci  |     0.06%

ADD COMMENT • link 6.7 years ago by Michael 55k

0

Entering edit mode

Yes.

I had used

 tophat -g 1

in running Tophat, which from my understanding should only give me reads that map to 1 loci in the resulting BAM file.

I do think that the number I am seeing in STAR reflects rRNA contamination which would map to multiple locations, but surprising that tophat is giving something else.

ADD REPLY • link 6.7 years ago by dho322 • 0

1

Entering edit mode

this likely does not explain the difference but do not confuse 'uniquely mapped' with 'report a single locus map' (== what you specify for tophat) as mentioned by @Michael Dondrup. Even with the -g 1 option of tophat you will still get reads that map to several loci but only one location will get reported, which is not the same as uniquely mapped!

ADD REPLY • link 6.7 years ago by lieven.sterck 15k