I am new to linux OS. As I have to do fastqc for 500 fastq files, I thought I will use linux to batch run and to pipeline the results to multiqc. Kindly help me with the linux commands to do the same. Thanks in advance.
I am new to linux OS. As I have to do fastqc for 500 fastq files, I thought I will use linux to batch run and to pipeline the results to multiqc. Kindly help me with the linux commands to do the same. Thanks in advance.
There are a lot of answers to this question - the best will depend on the details of your exact situation (single workstation or cluster? Doing it once or doing it routinely?). The simplest that I can think of is to just run the jobs in bash. So for example:
# Run FastQC on all the FastQ files
for f in *.fastq.gz;
do
fastqc $f
done
# Run MultiQC on the results
multiqc .
This is simple but it will be fairly slow, as it runs FastQC on one file at a time. Depending on the size of your compute setup, it may be better to run a dedicated pipeline tool such as Nextflow or Snakemake. I work with Nextflow and the nf-core community, and the template pipeline that we base new pipelines on does basically what you're asking for, so you could even use that: nf-core create
to make a new pipeline.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
A few points/queries regarding your question:
When you say you have 500 raw sequence reads, do you mean you have 500 fastq samples, or one sample with 500 reads?
If it is the later, you don't need multiqc. Multiqc is specifically used when you have multiple samples and you need to collate multiple sequencing data reports into one combined report.
Sorry I meant I have 500 fastq files