Entering edit mode
5 weeks ago
sodiumnitrate
▴
20
I'm new to nextflow and I'm trying to figure out how to:
- Get a large
.fasta
file, split it into pieces of size1000
- Apply same python script to each piece.
- Pass the list of files to a process to
cat
them into one file.
I have something like this:
params.infile = "huge.fasta"
params.size = 1000
process filter_seq {
input:
file x
output:
path "filtered_${x}.fasta"
script:
"""
python $projectDir/filter.py $x filtered_${x}.fasta
"""
}
process collect_split_proc {
publishDir 'results', model: 'copy'
input:
var out_name
path x
output:
path out_name
script:
"""
< $x cat >> $out_name
"""
}
workflow {
Channel.fromPath(params.infile) | splitFasta(by: params.size, file: true) | filter_seq
collect_split_proc("filtered.fasta", filter_seq.out) | collectFile
}
However, this results in the error:
Process `collect_split_proc` declares 1 input channel but 2 were specified
I want to be able to pass out_name
to the collect_split_proc
process because I will be doing other operations down the line that also require the file to be split and then stitched back together, so I want to re-use it. I'm also a newbie, so I suspect that there's a trivial way to cat
results of splitFasta | process
into a specific file. Is there such a thing? If not, what is wrong with my process?
Note: the python script has two command line arguments: input_file_name
, output_file_name
.