Dear community,
does somebody have experience with run times for the RNA-Seq alignment with NovoAlign? Especially USeq MakeTranscriptome for creating annotations as suggested in [1] seems to take pretty long. It would be important for me to have some estimate, how much time the alignment or at least some steps needs to finish or alternatively how large the resulting files are.
I run the transcriptome assembly on both hg19 and hg38 with the following options:
java -jar /opt/useq/Apps/MakeTranscriptome -f <path-to-hg19-fastq.gz-files-per-chromosome-from-ucsc-golden-path> -u <path-to-refFlat-txt-from-ucsc-genome-browser> -r 96 -n 60000 -m 10 -s
This runs now for three days on hg19. Thereafter, I will run NovoAlign with the following commands:
novoindex -n hg19 <path-for-index> <path-to-masked-genome> <path-to-transcriptome-file-1> <path-to-transcriptome-file-2>
novoalign -o SAM -f <forward-fastq> <reverse-fastq> -d <path-to-index> -r All 10 -v 0 70 70 '[>]([^:]*)'
Finally, I will need to fix the coordinates:
java -jar /opt/useq/Apps/SamTranscriptomeParser -f <path-to-sam-file> -a 50000 -n 100 -u
I run everything on a machine with 8x Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz and 48 GB RAM, for a GM12878 data set with approximately 118,000,000 paired reads [2].
If you have any estimates for any step, I would highly appreciate it!
Cheers, Tamara