I have many paired fastq files and I have a problem on after running trim_galore package, as it named the fastq files with _1_val_1 and _2_val_2, for example: AD50_CTGATCGTA_1_val_1.fq.gz and AD50_CTGATCGTA_2_val_2.fq.gz.
I would like continue snakemake and use
import os
import snakemake.io
import glob
DIR="AD50"
(SAMPLES,READS,) = glob_wildcards(DIR+"{sample}_{read}.fq.gz")
READS=["1","2"]
rule all:
input:
expand(DIR+"{sample}_dedup_{read}.fq.gz",sample=SAMPLES,read=READS)
rule clumpify:
input:
r1=DIR+"{sample}_1_val_1.fq.gz",
r2=DIR+"{sample}_2_val_2.fq.gz"
output:
r1out=DIR+"{sample}_dedup_1.fq.gz",
r2out=DIR+"{sample}_dedup_2.fq.gz"
shell:
"clumpify.sh in={input.r1} in2={input.r2} out={output.r1out} out2={output.r2out} dedupe subs=0"
and the error is:
Building DAG of jobs...
MissingInputException in line 13 of /home/peterchung/Desktop/Rerun-Result/clumpify.smk:
Missing input files for rule clumpify:
AD50/AD50_CTGATCGTA_2_val_2_val_2.fq.gz
AD50/AD50_CTGATCGTA_2_val_1_val_1.fq.gz
I tired another way, somehow the closest is that it detected the missing input like AD50_CTGATCGTA_1_val_2.fq.gz and AD50_CTGATCGTA_2_val_1.fq.gz which is not exist.
I am not sure the glob_wildcards function I used properly since there are many underscore in it. I tired:
glob_wildcards(DIR+"{sample}_{read}_val_{read}.fq.gz")
but it did not work as well.
my file name is AD50_CTGATCGTA_1_val_1.fq.gz and AD50_CTGATCGTA_2_val_2.fq.gz
my command line is:
I think my glob_wildcards is incorrect. it recognises AD50_CTGATCGTA_1_val and AD50_CTGATCGTA_2_val as sample and that's why it stated AD50/AD50_CTGATCGTA_2_val_2_val_2.fq.gz AD50/AD50_CTGATCGTA_2_val_1_val_1.fq.gz is missed.
but I want only AD50_CTGATCGTA as sample and let it recognise _1_val_1 and _2_val_2 rather than _1_val_2 or _2_val_1 which it is not exist. Thanks for your reply.