Hi,
I am running TopHat2/2.0.14 using this code:
tophat \
--transcriptome-index "${transcriptome_index}" \
--num-threads 4 \
--output-dir "${output_dir}" \
--coverage-search \
--microexon-search \
--prefilter-multihits \
--library-type "${lib_type}" \
"${Bowtie2_genome_index_base}" $R1_And_Singletons_reads $R2_reads
The final align_summary
always show very low mapping rate. This is one example:
Left reads:
Input : 9677854
Mapped : 1573190 (16.3% of input)
of these: 33728 ( 2.1%) have multiple alignments (9871 have >20)
Right reads:
Input : 9677930
Mapped : 1549704 (16.0% of input)
of these: 33265 ( 2.1%) have multiple alignments (9793 have >20)
Unpaired reads:
Input : 376485
Mapped : 58233 (15.5% of input)
of these: 1312 ( 2.3%) have multiple alignments (414 have >20)
16.1% overall read mapping rate.
Aligned pairs: 1001665
of these: 24413 ( 2.4%) have multiple alignments
1666 ( 0.2%) are discordant alignments
10.3% concordant pair alignment rate.
I checked the log files of the TopHat run. Surprisingly, I found the initial Bowtie mapping rate very high. Here is the bowtie.sample1.R1_And_Singletons_reads.log
10064133 reads; of these:
10064133 (100.00%) were unpaired; of these:
3168847 (31.49%) aligned 0 times
6305852 (62.66%) aligned exactly 1 time
589434 (5.86%) aligned >1 times
68.51% overall alignment rate
This is the running log
[2015-08-16 17:22:06] Beginning TopHat run (v2.0.14)
-----------------------------------------------
[2015-08-16 17:22:06] Checking for Bowtie
Bowtie version: 2.2.3.0
[2015-08-16 17:22:06] Checking for Bowtie index files (transcriptome)..
[2015-08-16 17:22:06] Checking for Bowtie index files (genome)..
[2015-08-16 17:22:06] Checking for reference FASTA file
[2015-08-16 17:22:06] Generating SAM header for /Bowtie2Index/gi
[2015-08-16 17:22:15] Reading known junctions from GTF file
[2015-08-16 17:22:15] Pre-filtering multi-mapped left reads
[2015-08-16 17:22:15] Mapping sample1.R1_And_Singletons_reads to genome genome with Bowtie2
[2015-08-16 17:35:01] Pre-filtering multi-mapped right reads
[2015-08-16 17:35:01] Mapping sample1.R2_reads to genome genome with Bowtie2
[2015-08-16 17:48:01] Preparing reads
left reads: min. length=38, max. length=81, 9677465 kept reads (10224 discarded)
right reads: min. length=21, max. length=81, 9677279 kept reads (10410 discarded)
[2015-08-16 17:54:34] Using pre-built transcriptome data..
[2015-08-16 17:54:34] Mapping left_kept_reads to transcriptome equ with Bowtie2
[2015-08-16 17:57:49] Mapping right_kept_reads to transcriptome equ with Bowtie2
[2015-08-16 18:00:40] Resuming TopHat pipeline with unmapped reads
[2015-08-16 18:06:02] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/3)
[2015-08-16 18:08:38] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/3)
[2015-08-16 18:11:20] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/3)
[2015-08-16 18:20:21] Mapping right_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/3)
[2015-08-16 18:22:46] Mapping right_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/3)
[2015-08-16 18:25:29] Mapping right_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/3)
[2015-08-16 18:31:00] Searching for junctions via segment mapping
[2015-08-16 22:48:52] Retrieving sequences for splices
[2015-08-16 22:53:39] Indexing splices
[2015-08-16 23:07:13] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2015-08-16 23:09:46] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2015-08-16 23:12:19] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2015-08-16 23:17:13] Joining segment hits
[2015-08-16 23:20:47] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/3)
[2015-08-16 23:23:20] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/3)
[2015-08-16 23:25:56] Mapping right_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/3)
[2015-08-16 23:31:21] Joining segment hits
[2015-08-16 23:34:53] Reporting output tracks
-----------------------------------------------
[2015-08-16 23:44:43] A summary of the alignment counts can be found in /projectX/tophat_output/tophat_sample1/align_summary.txt
[2015-08-16 23:44:43] Run complete: 06:22:36 elapsed
I am doing the analysis of several RNAseq datasets. Different read lengths and different library preparations so it is unlikely to be RNAseq dependent problem. I am working with a model organism with a good reference genome.
Thank you
Why the name of your paired end files are different?
is this just the name of the files? Or they are really different? Just for knowing.
I add the singletons to the end of the R1 file according to TopHat manual.