Question

Mapping with data stored on external hard disk

0

Entering edit mode

3.8 years ago

kuilin • 0

Hi all, I am very new to RNA-seq, and am currently in the process of self-learning, so perhaps this is quite a stupid question to ask... I have received the raw data (fasta) of my experiments, ran in triplicate, from our core facility. The reads for each is 50M, and thus the files are huge. While I'm in the process of requesting more disk storage from our core facility, I was wondering if it will be possible to process the raw data (Cutadapt, star aligned) with our supercomputer, with my raw data stored in the external hard disk? Is there a program which I am able to allow communication between the supercomputer and my hard-disk?

alignment rna-seq • 1.4k views

ADD COMMENT • link updated 3.8 years ago by GenoMax 148k • written 3.8 years ago by kuilin • 0

score 2 · Answer 1 · 2021-03-03

unless you can hook up your external drive to the supercomputer (if they even allow that) I don't think it will work.

On the other hand, those external (cheaper) disks are not the fastest around so you might create a bottleneck when reading (writing?) to it.

and this is speculative: I think you might be quicker waiting for additional storage on the system than to start running it from external hard drives.

score 1 · Answer 2 · 2021-03-03

As lieven.sterck said, it is best to wait for additional resources from your core facility.

Now, certainly there are ways you could process the data in the supercomputer, but as you do not provide any details (the operating system of the computer hosting the hard-drive, if you have administrative access to this computer, your network setup, and so on), it is hard to give concrete suggestions.

For example, you could mount the external hard-drive with sshfs over the network. But then, you would have two potential bottlenecks, the speed of the external drive and the network speed.

Also, depending on the specs of your local computer, and how it is used, it may be feasible to perform some steps of the analysis locally (e.g. FastQC and Cutadapt).

score 1 · Answer 3 · 2021-03-03

I was wondering if it will be possible to process the raw data (Cutadapt, star aligned) with our supercomputer, with my raw data stored in the external hard disk?

If you have an account on the supercomputer it must have space to accommodate your data. Even your home directory there (and there should be some scratch space available as well) can be used to do the analysis. Look into moving the data to supercomputer using secure copy/secure FTP from locally mounted external disk.