Question

Splitting fastq file for mapping with tophat2

0

Entering edit mode

8.5 years ago

debitboro ▴ 270

hi all,

In order to map RNASeq fastq files of about 40M to ensembl hg using tophat2, I have the following idea:

Split the fastq files into small files of 10M
Map the small files separately to hg and generate .bam files
Merge the generated .bam file into one huge .bam file

I have no experience about the results of this method, this is what I ask for the help from experienced persons that have already performed this kind of method. In other word, does this method give the same result as we perform the mapping without splitting.

any help, advice, or suggestion ?

RNA-Seq tophat2 fatsq files splitting • 2.5k views

ADD COMMENT • link updated 8.5 years ago by tiago211287 ★ 1.5k • written 8.5 years ago by debitboro ▴ 270

0

Entering edit mode

I do not know if it makes any difference given that tophat2 could be run on multi threads (--num-threads). If you are using the GTF file with tophat, it creates a transcriptome for every alignment. Instead you could create a transcriptome fasta before hand and provide it to tophat (--transcriptome-index) to reduce the time.

It would be better to use STAR, as mentioned below, unless you have any specific concerns with it.

ADD REPLY • link 8.5 years ago by GouthamAtla 12k

score 1 · Answer 1 · 2016-05-16

1

Entering edit mode

8.5 years ago

tiago211287 ★ 1.5k

You would get the same result. But if you are trying to accelerate the alignment I suggest you to use the STAR Aligner which will map 40 M reads in 20-30 Minutes, besides that, the alignment is better. The drawback is that you need a machine with a lot of RAM ( About 64 GB minimum).

In addition, if you still need use Tophat2 you would need to run the pieces of 10 M reads in parallel in order to really get faster results. If you try to align in sequence, it would delay even more.

ADD COMMENT • link 8.5 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

Thank you tiago,

With the splitting it take 4 hours to accomplish the mapping with tophat2.

ADD REPLY • link 8.5 years ago by debitboro ▴ 270