I am trying to align RNASeq data to both human and mouse genome but seems getting into some trouble. I merged human genome.fa with mouse genome.fa into a combined genome.fa and then only used human genes.gtf for transcriptome assembly. Previously I have successfully aligned these RNASeq data to human genome only but my client said these RNA may have been contaminated with mouse RNA. Given there were more than 50% reads were unable mapped to human, I would try to align the data to combined human and mouse genome. Any suggestions how to do it using tophat2? With my procedure, I got the below errors:
[2016-04-13 14:06:17] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2016-04-13 14:06:17] Checking for Bowtie
Bowtie version: 2.1.0.0
[2016-04-13 14:06:17] Checking for Samtools
Samtools version: 0.1.19.0
[2016-04-13 14:06:17] Checking for Bowtie index files (genome)..
[2016-04-13 14:06:17] Checking for reference FASTA file
[2016-04-13 14:06:17] Generating SAM header for /data1/workspace/DCI/Sarantopoulos/RNASeq/Data/human_and_mouse_genome_combined/human_mouse_combined/Bowtie2Index/genome
format: fastq
quality scale: phred33 (default)
[2016-04-13 14:06:22] Reading known junctions from GTF file
[2016-04-13 14:06:27] Pre-filtering multi-mapped left reads
[2016-04-13 14:06:27] Mapping JP01_S9_L003_R1_001-trimmed_1 to genome genome with Bowtie2
[2016-04-13 14:41:46] Pre-filtering multi-mapped right reads
[2016-04-13 14:41:46] Mapping JP01_S9_L003_R2_001-trimmed_2 to genome genome with Bowtie2
[2016-04-13 15:16:00] Preparing reads
left reads: min. length=114, max. length=126, 32220590 kept reads (1960224 discarded)
right reads: min. length=107, max. length=126, 32456688 kept reads (1724126 discarded)
[2016-04-13 15:40:07] Building transcriptome data files..
[FAILED]
Error: gtf_to_fasta returned an error.
But there was no error when I align to human genome only previously.
You should try to bin the reads before aligning them. Give BBSplit a try for this.
As you discovered it will get very messy (unless you create custom GTF files, indexes etc) if you try to align to both genomes at the same time.