Hi all, I am trying to run atac pipeline of Encode on a HPC but not sure the correct command after reading their instruction. https://github.com/ENCODE-DCC/atac-seq-pipeline
If you want to run your data, what will you put in the INPUT_JSON?
INPUT_JSON="https://storage.googleapis.com/encode-pipeline-test-samples/encode-atac-seq-pipeline/ENCSR356KRQ_subsampled.json"
caper hpc submit atac.wdl -i "${INPUT_JSON}" --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME
Thank you so much!
Thank you for the instruction! I see the beside json file we also have input file which I am confused. Because we have already defined the fastq.gz file in the json file. Would you explain about the value I should put in adapter if it is Illumina because I don't know why they use that value.
The "Input files" section is details on how you can specify your fastqs in the JSON. (You specify all pipeline parameters in the JSON.) Essentially, the pipeline aims to be flexible and take a wide range of input files, so there are many ways to specify your inputs WITHIN the JSON.
for "adapters" section, you can read the instructions and follow them (either manually specify adapters or use the auto-detect feature). I suggest just setting the auto-detect feature on
"atac.auto_detect_adapter": true
and ignoring (exlude from JSON file) all other atac.adapters keys. Unless of course, you have custom or different adapaters.So do we have redundant in this case when we specify the path to fastq.gz files in both input file and json file? json file is .json, how about the file format of input file. Thank you for your help!
Curious, can you give specific examples of what you refer to as "input file" and "json file"?
In the screenshot you shared, input files refers to input fastq files or input bams or other types of input sequencing/mapping files. The JSON file contains information specific to your experiment and tells the pipeline where everything is. The JSON file in the "input" to the pipeline submission command, but contains locations to your actual "input files" and other relevant files.
Sure, here is my json file, I have 8 fastq files:
I am not sure what to put in the input file. Is it something like this?
What input file are you referring to? Your fastqs are given in the json?
Yes, I think that is what the pipeline wants. I didn't see tutorials or videos which are easy to follow for new users so just reading the instruction still difficult for me.
Yes, it is a lot to take in.
In your input json, is "atac.fastqs_wt_rep2_R2" valid? This key should match exactly what is given in the examples.
Once you're json is ready, use that to begin the pipeline. The fastqs you specified are the only input files you need to worry about. If your json is in the current directory and named "my_json.json" it could look like this:
Hi @rfran010. What do you mean by valid? Could you give an example of how to match look like?
Please see the screenshot you sent of input files.
Do you mean it has to be like:
I believe this part is incorrect