Hello,
I would like to download FASTQ files from the European Nucleotide Archive (ENA) to use them with FastQC, kallisto,etc. In particular, this: https://www.ebi.ac.uk/ena/browser/view/PRJEB31975 Since it's a huge amount of data, how could I do it if I don't even think I can download it on my pc? (I'm using a mac)
Thank you in advance!
Hello,
thank you for your quick response. I have the files.txt now (attached image), but I'm not sure what this means..
could you explain a bit please? Thank you in advance!
The option
-j
inparallel
decides how many jobs are executed in parallel. Downloading many files at a time only makes sense if you have a good internet connection and a harddrive that can consume much input traffic. Else, just download sequentially and wait.Understood, thank you so much for your help!!
There is a total of 580+GB of raw data associated with this accession: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJEB31975&o=acc_s%3Aa
Make sure you have enough space/bandwidth before trying to download the entire set.
so, there is no way to use this kind of datasets without downloading them to the pc, right? (sorry if this is a very silly question).
It is a valid question. To analyze the raw sequence data you do need to download it locally or to a cloud computing environment or a compute resource at your local institution. There was an online web based tool that allowed use of SRA accessions. Unfortunately the name of the tool/lab is blanked out in my mind for now. If I recall/find I will post it.
These appear to be a lot of samples. If you want to learn you don't need to download all of the data. A couple of samples would be fine for FastQC/Kallisto/Salmon.
I'm going to use CESGA (Supercomputing center) services for this but since it's my first time trying to do this type of things, I'm just having a lot of questions. I think I will start using a few samples. Thank you so much for your help
depends what you mean with "use". You could download each fastq, use it + delete the fastq after the processing.
For "use" I mean, I want to create a ML model with this dataset to correct classify aTB & LTBI patients. And to preprocess this raw data (fastqc --> kallisto --> ...) since it's a huge amount of data I was asking myself how to do it. I think I will start using a few samples.. thank you so much for your help