Automating Workflow with multiple Inputs and Differing Variables
0
0
Entering edit mode
5.0 years ago

Hello,

I am new to using the command line and I was hoping to simplify/ automate my workflow. A basic overview of the experiment is that I performed paired-end deep sequencing of 25 Cas9 off targets with multiple samples. The inputs that change from sample to sample are input files (PE1 and PE2), amplicon sequence, and gRNA sequence (predicted off-target gRNA sequence that differs slightly for each off-target).

Here is an example of the two paired end sequence names for one of the samples.

3_S2_L001_R1_001.fastq 3_S2_L001_R2_001.fastq

I am using the crispresso program for the analysis of the off targets and will include a sample of the command I ran on each of these files.

docker run -v ${PWD}:/DATA -w /DATA -i pinellolab/crispresso2 CRISPResso --fastq_r1 2_S1_L001_R1_001.fastq.gz --fastq_r2 2_S1_L001_R2_001.fastq.gz --amplicon_seq ATAAAAACCATACACATTCAGTGGGAAACCTTCAGCCATAGAGAAGTATAGGCAGGGTGCAGCTGATTGCTCTGTCTTTGGGCAATTTAGCTTTTAGGCCAGAGGCCACAGATGGGTAGCCTGGTGTGTGCCTAGGGTGTTTTTGTTTGGCTGGCGCAATATTTTTTAAAACTGTAAGTTTATTGCCAGCATTTAA -g TTAGGCCAGAGGCCACAGAT -q 30 -s 30 --min_bp_quality_or_N 30 -w 3

I have not clue how to go about writing a script for this. I was thinking I could put the amplicon and gRNA names in separate arrays in the same order as the files and paste that variable name in the command instead, but I'm unsure of how to do a for loop with two file inputs at the same time. Im most likely totally wrong about how to go about this so any help would be appreciated.

next-gen sequencing crispresso automation • 1.0k views
ADD COMMENT
0
Entering edit mode

Well, the straightforward way is to write a pipeline and then execute it for each pair of files. Take a look at the leading frameworks: nextflow, wdl and snakemake

ADD REPLY

Login before adding your answer.

Traffic: 2450 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6