Entering edit mode
8.4 years ago
jrxu.bioinf
▴
20
Hello,
I am a new user of tophat. The version in use is v2.1.1 (tophat2).
I used default parameters except for --no-coverage-search to save time. The read length is 102. The running output looks fine (as shown below), but 0% reads are mapped to the genome.
BTW, this exact same setting has been successfully mapped another set of RNA-seq data (read length = 30).
Thanks!
[2016-06-26 14:44:07] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2016-06-26 14:44:07] Checking for Bowtie
Bowtie version: 2.2.9.0
[2016-06-26 14:44:07] Checking for Bowtie index files (genome)..
[2016-06-26 14:44:07] Checking for reference FASTA file
[2016-06-26 14:44:07] Generating SAM header for ~/data/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index/genome
[2016-06-26 14:44:58] Reading known junctions from GTF file
[2016-06-26 14:45:01] Preparing reads
left reads: min. length=102, max. length=102, 36967894 kept reads (26707 discarded)
[2016-06-26 14:58:17] Building transcriptome data files ./tophat_out/tmp/genes
[2016-06-26 14:58:55] Building Bowtie index from genes.fa
[2016-06-26 15:05:33] Mapping left_kept_reads to transcriptome genes with Bowtie2
[2016-06-26 15:28:27] Resuming TopHat pipeline with unmapped reads
[2016-06-26 15:28:27] Mapping left_kept_reads.m2g_um to genome genome with Bowtie2
[2016-06-26 16:14:47] Mapping left_kept_reads.m2g_um_seg1 to genome genome with Bowtie2 (1/4)
[2016-06-26 16:34:49] Mapping left_kept_reads.m2g_um_seg2 to genome genome with Bowtie2 (2/4)
[2016-06-26 16:50:05] Mapping left_kept_reads.m2g_um_seg3 to genome genome with Bowtie2 (3/4)
[2016-06-26 17:00:49] Mapping left_kept_reads.m2g_um_seg4 to genome genome with Bowtie2 (4/4)
[2016-06-26 17:18:38] Searching for junctions via segment mapping
[2016-06-26 17:28:01] Retrieving sequences for splices
[2016-06-26 17:29:11] Indexing splices
Building a SMALL index
[2016-06-26 17:30:18] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/4)
[2016-06-26 17:37:09] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/4)
[2016-06-26 17:44:09] Mapping left_kept_reads.m2g_um_seg3 to genome segment_juncs with Bowtie2 (3/4)
[2016-06-26 17:50:15] Mapping left_kept_reads.m2g_um_seg4 to genome segment_juncs with Bowtie2 (4/4)
[2016-06-26 17:58:30] Joining segment hits
[2016-06-26 18:05:44] Reporting output tracks
-----------------------------------------------
[2016-06-26 18:18:49] A summary of the alignment counts can be found in ./tophat_out/align_summary.txt
[2016-06-26 18:18:49] Run complete: 03:34:42 elapsed
ALIGN summary is below
Reads:
Input : 36994601
Mapped : 5067 ( 0.0% of input)
of these: 1534 (30.3%) have multiple alignments (3 have >20)
0.0% overall read mapping rate.
Any time you see no or less than expected alignment the first thing to try is to take a random sample of reads (10-15) and to do a blast at NCBI. If the top hits are not from the genome you expect to be there then you will have to start figuring out what went wrong. If the blast hits are partial then it is possible that you have adapter contamination in your data (did you look at the data with FastQC before alignments) and you would need to trim the reads before alignment.
I should have used "split-file" to convert SRA to FASTQ. Without the parameter, the pair-end reads are merged into one and cause the problem!
Use Kraken to screen reads, it is faster than BLAST and allows you to screen the whole dataset.
Default kraken db only has bacterial, archaeal and viral data so that would not always provide a useful answer. Surely blasting 10-15 sequences (in this case where almost no reads are aligning) would be much faster than kraken.
You can add sequences to or alter Kraken databases. Sure, blasting a few reads is quick, but won't allow you to get an idea of the degree of contamination.
I tested several reads. Each read is mapped perfectly to a transcript correctly, BUT the first half of the read is mapped to forward strand and the second half to reverse strand. How to handle this read format? Thanks.
Did you run FastQC on these?
That most likely indicates that you have short inserts (and thus read-through/contamination with Illumina adapters). You would need to trim these reads to get them to aligns.