How much HDD, RAM, and internet speed is required to process whole genome paired end data each fastq files being 170G? My machine has following configuration:
HDD-1TB
RAM-16GB
But its taking >1hr for the following simple command to process:
Your question can use some clarity. How many of these files are you referring to. Just two for one sample or more?
I am not sure why you need internet connectivity to process genome data (assuming you have the reference downloaded and indexed).
RAM is going to be limiting if this is human genome (or similar sized) data. You need ~30G of free RAM with many of the aligners. Your best bet may be bwa which is one of the lightest memory requirement aligner (~6G free for human data).
Counting lines in a fastq files can't be realistically be considered processing data and doesn't give you any idea of how long it may take to scan/trim/align the data. You should also keep the fastq files compressed to save space. Most NGS programs understand compressed data and will work with compressed data seamlessly.
You can look into Amazon AWS and Google Compute to get an idea of pricing for online compute resources.
Just a rough estimate... You have 340 GB of fastq (170 x 2, I assume this is uncompressed). Aligned and in BAM format this may be ~50 GB. To sort it you need another 50 GB for the temporary files and 50 GB for the final aligned BAM. You could pipe the output of the aligner (say bwa mem) into samtools sort so you save time and ~50 GB of disk space. Once done, you can delete fastq and unligned bam, if any, and you finish with ~50GB of BAM and ~1/2 TB peak disk space usage. Of course, 340 GB uncompressed could be reduced to maybe even 1/10 of that size with gzip.
On a Broadwell Xeon node (2.4GHz I think) with 128GB RAM, processing a 2x100bp WGS sample with 635.658.231 read pairs, using BWA mem with 24 threads, piped into SAMBLASTER for duplicate marking, and a sort with SAMBAMBA with 30G memory usage, it takes 7-9 hours. As you are limited to 16GB RAM, you'll probably need to limit BWA to 8 threads or so, if your machine has that capacity. Still, it will probably take an entire day.
It is paired end genomic data for single human sample. Thus, two fastq files of 170G each. Internet connectivity is required only to download dataset.