Hello,
I'm trying to write a simple bwa script for my research, but it keeps failing. I have two questions in this post.
The first code can be executed, but only the first paired fastq files are processed. Google search tells me it's because the index_ch is a queue channel instead of a value channel, but I'm unsure if it's the reason and how to tackle it.
nextflow.enable.dsl=2
params.genome = "./data/genome.fa"
params.reads = "./data/reads/*_{1,2}.fastq"
params.map = "map.csv"
process genome_index{
conda "bwa-mem2"
input:
file(genome)
output:
file "genome.*"
script:
"""
bwa-mem2 index ${genome}
"""
}
process alignment{
conda "bwa-mem2 samtools"
input:
tuple val(meta), path(reads)
file(genome)
file(index)
output:
tuple val(meta), path("*.bam"), emit: bam
script:
prefix = task.ext.prefix ?: "${meta.id}"
"""
bwa-mem2 mem\\
-t $task.cpus \\
$genome \\
$reads \\
| samtools sort --threads $task.cpus -o ${prefix}.bam -
"""
}
workflow {
Channel
.fromPath(params.map)
.splitCsv(header:true)
.map{row ->
metaMap =[id:row.id, single:row.single_end]
[metaMap, [file(row.fastq1), file(row.fastq2)]]
}
.set{file_ch}
Channel
.fromPath(params.genome)
.set{genome_ch}
index_ch = genome_index(genome_ch)
bwa_ch = alignment(file_ch,genome_ch,index_ch)
To overcome the previous problem. I put all the indexed files, including "genome.fa.bwt.2bit.64", into a new folder (g_index) and created its own channel. However, still, only the first pair of files were processing, and the following error showed up:
... ... Command executed:
bwa-mem2 mem\ -t 1 \ genome.fa\ read_1.fastq read_2.fastq \ | samtools sort --threads 1 -o read.bam -
Command exit status: 1 ... ... ERROR! Unable to open the file: genome.fa.bwt.2bit.64
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"
Here is the second code:
params.reads = "./data/reads/*_{1,2}.fastq"
params.genome = "./data/genome.fa"
params.index = "./data/g_index/*"
process alignment{
conda "bwa-mem2 samtools"
input:
tuple val(meta), path(reads)
file(genome)
file(index)
output:
tuple val(meta), path("*.bam"), emit: bam
script:
prefix = task.ext.prefix ?: "${meta.id}"
"""
bwa-mem2 mem\\
-t $task.cpus \\
$genome \\
$reads \\
| samtools sort --threads $task.cpus -o ${prefix}.bam -
"""
}
Channel
.fromPath(params.map)
.splitCsv(header:true)
.map{row ->
metaMap =[id:row.id, single:row.single_end]
[metaMap, [file(row.fastq1), file(row.fastq2)]]
}
.set{file_ch}
Channel
.fromPath(params.genome)
.set{genome_ch}
Channel
.fromPath(params.index)
.set{index_ch}
bwa_ch = alignment(file_ch,genome_ch,index_ch)
I've followed several basic tutorials and nf-core pipelines, but it's been difficult when writing my own script and debugging it. I appreciate any help in fixing the code and improving my coding skills on Nextflow.
Cheers!
only works if the fasta is "genome.xxx"
Try
each
for the first issue. https://www.nextflow.io/docs/latest/process.html#input-repeaters-each.