My access to a cluster is limited at the moment and I would like to know if the computations can be completed in a reasonable amount of time (2-3 hours) with my desktop computer. I realize this is dependent on a large number of factors but a rough estimate/range would be helpful so I don't waste my time. Here is some relevant information:
- Data: Illumina HiSeq 4000 single-read data downstream of an RNA-immunoprecipitation
- FASTQ file size range: 220-450 MB
- Sequence length range: 12-30 million reads
- Reference genome: Mouse (898 MB)
- Index: Mouse (3.8 GB)
- Aligner: Tophat2
- CPU: Intel i5 4-core, 2.5 GHz
- RAM: 8 GB
- HDD: 5400 RPM
I have multiple FASTQ files to align so as long as I can be sure that the largest file can align within 2-3 hours I'm okay with that. It will take me a day to align them all but that is reasonable given my current access to resources.
Please let me know if there is any more useful information to include.
Take a subset of your reads and see how long it takes. Extrapolate from results..
Unless your desktop computer can only run for 2-3 hours at a time why are you worried? If you must give your computer a rest after 3 hours break your starting fasta file up into smaller chunks based on the test @5heikki suggested.
Why are you using fasta files (which I assume you converted from fastq)? Hopefully 8G RAM is not going to be the limiting factor. I don't remember how much memory tophat needs for a mouse sized genome.
That was a typo - I have been using FASTQ files. I've updated my post.