Entering edit mode
4.3 years ago
MatStat
▴
160
Hi all,
I'm trying to understand what are the hardware requirements for alignment using bowtie2/STAR of bulk RNA-seq data in terms of:
- Processor and cores
- RAM
- SSD hardrive space
- Computing clusters
- Server
The data:
- Seq method: Illumina HiSeq High Output V4
- Single-end (ie single-read)
- 100 human tissue samples
- Each sample yielded 21 million reads.
All the best.
Hi GenoMax,
Thank you for the prompt reply. I've read the answers in the link (and sub-links) you've added. But still didn't get an idea of a minimal to optimal settings using cluster computing servers for example.
Just as an example, I've tried to run 1 fastq sample on my mac (i5, 16 RAM, 500 SSD) and it was extremely strenuous and took more than 20 hrs.
Thanks.
There is no way around lack of memory/compute power. With most aligners you are going to need 30+ GB of free RAM with human/mouse genomes. If you start using more than a few threads (say 6-8) that requirement is going to start going up. Just throwing tons of cores does not solve the problem either since efficiency of software becomes important at that stage. Unless you are working with server hardware the I/O on a local machine (even with SSD's) is going to be limiting for the speed at which data can be aligned. It is not uncommon for it to take few hours to align 20-50M reads.
Any good 2 socket server (not a desktop) is going to provide anywhere between 8-64+ cores (depending on CPU's chosen). You would want at least 128G of RAM to have comfortable headroom for other tasks. Storage is really up to you. Ideally you will need performant network block storage that is mounted on this server via 10G ethernet or infiniband etc to provide the fastest possible read/write speeds. If that is not available then you will need to resort to local SSD's. Keep in mind that SSD's wear out and have a finite life if continuously written to.
Ok great thanks a lot for the answer.
Do you really need to use alignment for bulk RNA-seq? Why not use pseudoalignment? Less memory and computing requirements.
Hi dsull, So I am reproducing results according to a workflow protocol from GitHub. That means I need to do what they did. In addition, I assume they don't use pseudoalignment since it needs to be sensitive enough to get unmapped reads which can be further used.
Best