hi all,
In order to map RNASeq fastq files of about 40M to ensembl hg using tophat2, I have the following idea:
- Split the fastq files into small files of 10M
- Map the small files separately to hg and generate .bam files
- Merge the generated .bam file into one huge .bam file
I have no experience about the results of this method, this is what I ask for the help from experienced persons that have already performed this kind of method. In other word, does this method give the same result as we perform the mapping without splitting.
any help, advice, or suggestion ?
I do not know if it makes any difference given that tophat2 could be run on multi threads (
--num-threads
). If you are using the GTF file with tophat, it creates a transcriptome for every alignment. Instead you could create a transcriptome fasta before hand and provide it to tophat (--transcriptome-index
) to reduce the time.It would be better to use
STAR
, as mentioned below, unless you have any specific concerns with it.