Hello,
I have a problem to with the relative path of the input/output files in my Snakefile rule. which is to create softlinks of output files in the same directory.
The workdir is defined in the configuration and the directory structure is like:
/store/proj/hapmap/John/bowtie2/
|
|-configs/config.yaml
|-rules/
| |-rule01.smk
| |-...
| |-rule04.smk
|-data/seqs_trimmed/
|-CN295_R1_val_1.fq.gz
|-CN295_R2_val_2.fq.gz
|-......
#Snakefile is rule04.smk: #/store/proj/hapmap/John/bowtie2/rules
configfile: "../configs/config.yaml" #/store/proj/hapmap/John/bowtie2/configs
workdir: config['dir_project'] #/store/proj/hapmap/John/bowtie2
DIR_OUT = "data/seqs_trimmed/"
rule softlink_PE:
input:
R1= DIR_OUT + '{sample}_R1_val_1.fq.gz',
R2= DIR_OUT + '{sample}_R2_val_2.fq.gz'
output:
R1= DIR_OUT + '{sample}_trimmed_PE_R1.fq.gz',
R2= DIR_OUT + '{sample}_trimmed_PE_R2.fq.gz'
shell:
"""
ln -s {input.R1} {output.R1}
ln -s {input.R2} {output.R2}
"""
The softlinks were successfully created but pointing to the wrong source file because of the relative path:
CN295_trimmed_PE_R1.fq.gz -> data/seqs_trimmed/CN295_R1_val_1.fq.gz
CN295_trimmed_PE_R2.fq.gz -> data/seqs_trimmed/CN295_R2_val_2.fq.gz
which should be pointing the ones located in the same folder. i.e.
CN295_trimmed_PE_R1.fq.gz -> CN295_R1_val_1.fq.gz
CN295_trimmed_PE_R2.fq.gz -> CN295_R2_val_2.fq.gz
From Snakemake manual FAQ it reads:
Relative paths in Snakemake are interpreted depending on their context.
Input, output, log, and benchmark files are considered to be relative to the working directory (either the directory in which you have invoked Snakemake or whatever was specified for --directory or the workdir: directive).
But it is still unclear to me to resolve the issue. I am confused about the relative path in Snakemake, although I am aware softlink with shell command can be tricky for the relative path. Any idea to correct this Snakefile problem is appreciated.
Further to this answer - if you are using GNU coreutils (ie. any modern Linux), there is a "-r" flag to fix this problem. I typically use "ln -snrf" when using "ln" in scripts (including snakefiles).
If you are on Mac you have the more basic BSD version of "ln", but you can install the GNU version via Homebrew. Or probably (I've not checked) you can install it via conda, which would make life simple if you are already installing Snakemake as a conda package.
That's really handy, and much more of a general solution than stripping off the directory names! So yifangt86, if you're on Linux you can probably just
ln -sr {input.R1} {output.R1}
and call it a day.