I have received 384 fastq.gz files. These come from paired-end sequencing so I have 2 files per patient so 192 patients. I am new to NGS data analysis and I wish to start using FastQC. What would be the best way to proceed?
- I know FastQC can be run graphically but presumably, with that many samples, it would be best to use the command line..
I read some places that merging all samples into a single (or 2 with paired-end) files might be the solution. Is that recommended? Or should I just use simple bash scripting in like below (or something similar)?
for i in *fastqc.gz do bsub < fastqc_script_with_commands.sh done
I guess I'm just curious if there is a convention of merging fastq files or keeping them separate (1 or 2 per sample).
Thanks
use gnu-parallel or snakemake.
or Nextflow. Examples of using FastQC inside a Nextflow pipeline here, here and here