Question

Low final mapping rate by TopHat in spite of high initial Bowtie mapping

0

Entering edit mode

9.3 years ago

drtamermansour ▴ 40

Hi,

I am running TopHat2/2.0.14 using this code:

    tophat \
    --transcriptome-index "${transcriptome_index}" \
    --num-threads 4 \
    --output-dir "${output_dir}" \
    --coverage-search \
    --microexon-search \
    --prefilter-multihits \
    --library-type "${lib_type}" \
    "${Bowtie2_genome_index_base}" $R1_And_Singletons_reads $R2_reads

The final align_summary always show very low mapping rate. This is one example:

    Left reads:
          Input     :   9677854
           Mapped   :   1573190 (16.3% of input)
            of these:     33728 ( 2.1%) have multiple alignments (9871 have >20)
    Right reads:
          Input     :   9677930
           Mapped   :   1549704 (16.0% of input)
            of these:     33265 ( 2.1%) have multiple alignments (9793 have >20)
    Unpaired reads:
          Input     :    376485
           Mapped   :     58233 (15.5% of input)
            of these:      1312 ( 2.3%) have multiple alignments (414 have >20)
    16.1% overall read mapping rate.

    Aligned pairs:   1001665
         of these:     24413 ( 2.4%) have multiple alignments
                    1666 ( 0.2%) are discordant alignments
    10.3% concordant pair alignment rate.

I checked the log files of the TopHat run. Surprisingly, I found the initial Bowtie mapping rate very high. Here is the bowtie.sample1.R1_And_Singletons_reads.log

    10064133 reads; of these:
      10064133 (100.00%) were unpaired; of these:
        3168847 (31.49%) aligned 0 times
        6305852 (62.66%) aligned exactly 1 time
        589434 (5.86%) aligned >1 times
    68.51% overall alignment rate

This is the running log

    [2015-08-16 17:22:06] Beginning TopHat run (v2.0.14)
    -----------------------------------------------
    [2015-08-16 17:22:06] Checking for Bowtie
                  Bowtie version:        2.2.3.0
    [2015-08-16 17:22:06] Checking for Bowtie index files (transcriptome)..
    [2015-08-16 17:22:06] Checking for Bowtie index files (genome)..
    [2015-08-16 17:22:06] Checking for reference FASTA file
    [2015-08-16 17:22:06] Generating SAM header for /Bowtie2Index/gi
    [2015-08-16 17:22:15] Reading known junctions from GTF file
    [2015-08-16 17:22:15] Pre-filtering multi-mapped left reads
    [2015-08-16 17:22:15] Mapping sample1.R1_And_Singletons_reads to genome genome with Bowtie2 
    [2015-08-16 17:35:01] Pre-filtering multi-mapped right reads
    [2015-08-16 17:35:01] Mapping sample1.R2_reads to genome genome with Bowtie2 
    [2015-08-16 17:48:01] Preparing reads
         left reads: min. length=38, max. length=81, 9677465 kept reads (10224 discarded)
        right reads: min. length=21, max. length=81, 9677279 kept reads (10410 discarded)
    [2015-08-16 17:54:34] Using pre-built transcriptome data..
    [2015-08-16 17:54:34] Mapping left_kept_reads to transcriptome equ with Bowtie2 
    [2015-08-16 17:57:49] Mapping right_kept_reads to transcriptome equ with Bowtie2 
    [2015-08-16 18:00:40] Resuming TopHat pipeline with unmapped reads
    [2015-08-16 18:06:02] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/3)
    [2015-08-16 18:08:38] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/3)
    [2015-08-16 18:11:20] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/3)
    [2015-08-16 18:20:21] Mapping right_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/3)
    [2015-08-16 18:22:46] Mapping right_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/3)
    [2015-08-16 18:25:29] Mapping right_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/3)
    [2015-08-16 18:31:00] Searching for junctions via segment mapping
    [2015-08-16 22:48:52] Retrieving sequences for splices
    [2015-08-16 22:53:39] Indexing splices
    [2015-08-16 23:07:13] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
    [2015-08-16 23:09:46] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
    [2015-08-16 23:12:19] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
    [2015-08-16 23:17:13] Joining segment hits
    [2015-08-16 23:20:47] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
    [2015-08-16 23:23:20] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
    [2015-08-16 23:25:56] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
    [2015-08-16 23:31:21] Joining segment hits
    [2015-08-16 23:34:53] Reporting output tracks
    -----------------------------------------------
    [2015-08-16 23:44:43] A summary of the alignment counts can be found in /projectX/tophat_output/tophat_sample1/align_summary.txt
    [2015-08-16 23:44:43] Run complete: 06:22:36 elapsed

I am doing the analysis of several RNAseq datasets. Different read lengths and different library preparations so it is unlikely to be RNAseq dependent problem. I am working with a model organism with a good reference genome.

Thank you

alignment TopHat RNA-Seq • 3.4k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by drtamermansour ▴ 40

0

Entering edit mode

Why the name of your paired end files are different?

$R1_And_Singletons_reads $R2_reads

is this just the name of the files? Or they are really different? Just for knowing.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

I add the singletons to the end of the R1 file according to TopHat manual.

ADD REPLY • link 9.3 years ago by drtamermansour ▴ 40

Ram · Answer 1 · 2015-08-23

0

Entering edit mode

9.3 years ago

drtamermansour ▴ 40

I found that --prefilter-multihits is the reason why this is happening. I think this is a bug in TopHat because the log files show that only few hundred reads are filtered out as multi hits. I tried TopHat2 version 2.1.0 but the problem is still the same.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.3 years ago by drtamermansour ▴ 40