I am a beginner to snakemake and I am trying to run fastqc on multiple paired-end reads. The format of the fastq.gz files are {sample}_L001_R{read}_001.fastq.gz where {sample} is the sample name and {read} is either 1 or 2. I want to combine the individual reports from fastqc into one report using multiqc. However, I keep running into this error that I am missing a comma in line 29. I am running this script using my university's high-performance computing cluster so have to load these modules before running the script: module load miniconda/4.12.0
, module load snakemake/7.17.1
and conda activate snakemake-7.17.1
. My snakefile is below. I could use all the help I can get!
# Create a list of strings containing all of our sample names
SAMPLES = ['DC10J_S4', 'DC11K_218_359_S6', 'DC12L_S5', 'DC1A_278_299_S1', 'DC2B_S1', 'DC3C_266_311_S2', 'DC5E_254_323_S3','DC6F_S2','DC7G_242_335_S4','DC8H_S3','DC9I_230_347_S5']
READS = ['1', '2']
rule all:
input:
expand("/projectnb/altcells/ribosomal-profiling/data/{sample}_L001_R{read}_001.fastq.gz", sample=SAMPLES, read=READS)
#"/projectnb/altcells/ribosomal-profiling/data/FastQC_output/multiqc_report.html"
# run fastqc
# Rule to generate FastQC reports for each sample
rule fastqc:
input:
fastq1 = "/projectnb/altcells/ribosomal-profiling/data/{sample}_L001_R{read}_001.fastq.gz",
fastq2 = "/projectnb/altcells/ribosomal-profiling/data/{sample}_L001_R{read}_001.fastq.gz" # THIS IS LINE 29. MISSING A COMMA HERE?
output:
fastqc_report1 = "/projectnb/altcells/ribosomal-profiling/data/FastQC_output/{sample}_L001_R1_001_fastqc.html",
fastqc_report2 = "/projectnb/altcells/ribosomal-profiling/data/FastQC_output/{sample}_L001_R2_001_fastqc.html",
fastqc_zip1 = "/projectnb/altcells/ribosomal-profiling/data/FastQC_output/{sample}_L001_R1_001_fastqc.zip",
fastqc_zip2 = "/projectnb/altcells/ribosomal-profiling/data/FastQC_output/{sample}_L001_R2_001_fastqc.zip"
shell: """
module load fastqc/0.11.7
fastqc {input.fastq1} -o /projectnb/altcells/ribosomal-profiling/data/FastQC_output &&
fastqc {input.fastq2} -o /projectnb/altcells/ribosomal-profiling/data/FastQC_output
"""
# Rule to aggregate all FastQC reports into a single HTML file
rule aggregate_fastqc:
input:
expand("/projectnb/altcells/ribosomal-profiling/data/FastQC_output/{sample}_L001_R{read}_001_fastqc.html", sample=SAMPLES, read=READS)
output:
"/projectnb/altcells/ribosomal-profiling/data/FastQC_output/multiqc_report.html"
shell: """
module load multiqc &&
multiqc /projectnb/altcells/ribosomal-profiling/data/FastQC_output -o /projectnb/altcells/ribosomal-profiling/data/FastQC_output
"""
sorry that was a typo while i was copying my code! I still get the same error