Low final mapping rate by TopHat in spite of high initial Bowtie mapping
1
0
Entering edit mode
9.3 years ago

Hi,

I am running TopHat2/2.0.14 using this code:

    tophat \
    --transcriptome-index "${transcriptome_index}" \
    --num-threads 4 \
    --output-dir "${output_dir}" \
    --coverage-search \
    --microexon-search \
    --prefilter-multihits \
    --library-type "${lib_type}" \
    "${Bowtie2_genome_index_base}" $R1_And_Singletons_reads $R2_reads

The final align_summary always show very low mapping rate. This is one example:

    Left reads:
          Input     :   9677854
           Mapped   :   1573190 (16.3% of input)
            of these:     33728 ( 2.1%) have multiple alignments (9871 have >20)
    Right reads:
          Input     :   9677930
           Mapped   :   1549704 (16.0% of input)
            of these:     33265 ( 2.1%) have multiple alignments (9793 have >20)
    Unpaired reads:
          Input     :    376485
           Mapped   :     58233 (15.5% of input)
            of these:      1312 ( 2.3%) have multiple alignments (414 have >20)
    16.1% overall read mapping rate.

    Aligned pairs:   1001665
         of these:     24413 ( 2.4%) have multiple alignments
                    1666 ( 0.2%) are discordant alignments
    10.3% concordant pair alignment rate.

I checked the log files of the TopHat run. Surprisingly, I found the initial Bowtie mapping rate very high. Here is the bowtie.sample1.R1_And_Singletons_reads.log

    10064133 reads; of these:
      10064133 (100.00%) were unpaired; of these:
        3168847 (31.49%) aligned 0 times
        6305852 (62.66%) aligned exactly 1 time
        589434 (5.86%) aligned >1 times
    68.51% overall alignment rate

This is the running log

    [2015-08-16 17:22:06] Beginning TopHat run (v2.0.14)
    -----------------------------------------------
    [2015-08-16 17:22:06] Checking for Bowtie
                  Bowtie version:        2.2.3.0
    [2015-08-16 17:22:06] Checking for Bowtie index files (transcriptome)..
    [2015-08-16 17:22:06] Checking for Bowtie index files (genome)..
    [2015-08-16 17:22:06] Checking for reference FASTA file
    [2015-08-16 17:22:06] Generating SAM header for /Bowtie2Index/gi
    [2015-08-16 17:22:15] Reading known junctions from GTF file
    [2015-08-16 17:22:15] Pre-filtering multi-mapped left reads
    [2015-08-16 17:22:15] Mapping sample1.R1_And_Singletons_reads to genome genome with Bowtie2 
    [2015-08-16 17:35:01] Pre-filtering multi-mapped right reads
    [2015-08-16 17:35:01] Mapping sample1.R2_reads to genome genome with Bowtie2 
    [2015-08-16 17:48:01] Preparing reads
         left reads: min. length=38, max. length=81, 9677465 kept reads (10224 discarded)
        right reads: min. length=21, max. length=81, 9677279 kept reads (10410 discarded)
    [2015-08-16 17:54:34] Using pre-built transcriptome data..
    [2015-08-16 17:54:34] Mapping left_kept_reads to transcriptome equ with Bowtie2 
    [2015-08-16 17:57:49] Mapping right_kept_reads to transcriptome equ with Bowtie2 
    [2015-08-16 18:00:40] Resuming TopHat pipeline with unmapped reads
    [2015-08-16 18:06:02] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/3)
    [2015-08-16 18:08:38] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/3)
    [2015-08-16 18:11:20] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/3)
    [2015-08-16 18:20:21] Mapping right_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/3)
    [2015-08-16 18:22:46] Mapping right_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/3)
    [2015-08-16 18:25:29] Mapping right_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/3)
    [2015-08-16 18:31:00] Searching for junctions via segment mapping
    [2015-08-16 22:48:52] Retrieving sequences for splices
    [2015-08-16 22:53:39] Indexing splices
    [2015-08-16 23:07:13] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
    [2015-08-16 23:09:46] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
    [2015-08-16 23:12:19] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
    [2015-08-16 23:17:13] Joining segment hits
    [2015-08-16 23:20:47] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
    [2015-08-16 23:23:20] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
    [2015-08-16 23:25:56] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
    [2015-08-16 23:31:21] Joining segment hits
    [2015-08-16 23:34:53] Reporting output tracks
    -----------------------------------------------
    [2015-08-16 23:44:43] A summary of the alignment counts can be found in /projectX/tophat_output/tophat_sample1/align_summary.txt
    [2015-08-16 23:44:43] Run complete: 06:22:36 elapsed

I am doing the analysis of several RNAseq datasets. Different read lengths and different library preparations so it is unlikely to be RNAseq dependent problem. I am working with a model organism with a good reference genome.

Thank you

alignment TopHat RNA-Seq • 3.4k views
ADD COMMENT
0
Entering edit mode

Why the name of your paired end files are different?

$R1_And_Singletons_reads $R2_reads

is this just the name of the files? Or they are really different? Just for knowing.

ADD REPLY
0
Entering edit mode

I add the singletons to the end of the R1 file according to TopHat manual.

ADD REPLY
0
Entering edit mode
9.3 years ago

I found that --prefilter-multihits is the reason why this is happening. I think this is a bug in TopHat because the log files show that only few hundred reads are filtered out as multi hits. I tried TopHat2 version 2.1.0 but the problem is still the same.

ADD COMMENT

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6