Entering edit mode
6 months ago
myoui3122010
▴
30
New to Nextflow, the process to map using STAR is not executing.
Modules.nf
process STAR_INDEX {
tag "$genome.baseName"
input:
path genome
output:
path "genome_dir"
script:
"""
mkdir genome_dir
STAR --runMode genomeGenerate \
--genomeDir genome_dir \
--genomeFastaFiles ${genome} \
--runThreadN ${task.cpus}
"""
}
process STAR_ALIGN {
tag "$replicateId"
input:
path genome
path genomeDir
tuple val(replicateId), path(reads)
output:
tuple \
val(replicateId), \
path('Aligned.sortedByCoord.uniq.bam'), \
path('Aligned.sortedByCoord.uniq.bam.bai')
script:
"""
# ngs-nf-dev Align reads to genome
STAR --genomeDir $genomeDir \
--readFilesIn $reads \
--runThreadN $task.cpus \
--readFilesCommand zcat \
--outFilterType BySJout \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999
# Run 2-pass mapping (improve alignmets using table of splice junctions and create a new index)
STAR --genomeDir $genomeDir \
--readFilesIn $reads \
--runThreadN $task.cpus \
--readFilesCommand zcat \
--outFilterType BySJout \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--sjdbFileChrStartEnd SJ.out.tab \
--outSAMtype BAM SortedByCoordinate \
--outSAMattrRGline ID:$replicateId LB:library PL:illumina PU:machine SM:GM12878
# Select only unique alignments, no multimaps
(samtools view -H Aligned.sortedByCoord.out.bam; samtools view Aligned.sortedByCoord.out.bam|
grep -w 'NH:i:1') \
|samtools view -Sb - > Aligned.sortedByCoord.uniq.bam
# Index the BAM file
samtools index Aligned.sortedByCoord.uniq.bam
"""
}
rnaseq.nf
nextflow.enable.dsl=2
params.reads = "${launchDir}/data/*.fastq"
params.genome = "${launchDir}/genome/genome.fna"
include {
STAR_INDEX;
STAR_ALIGN;} from './modules.nf'
workflow {
reads_ch = Channel.fromFilePairs(params.reads)
// Execute processes in the workflow
STAR_INDEX(params.genome)
STAR_ALIGN(params.genome, STAR_INDEX.out, reads_ch)
}![enter image description here][1]
Do a quick
.view()
onSTAR_INDEX.out
to confirm that this channel is being populated.STAR_INDEX.out runs properly and the index files are generated properly.
Sorry for the late reply
I guess one thing to possibly try out would be to check whether
reads_ch
actually reads in the files correctly (so just something likereads_ch.view()
would be enough to check if it actually has the files parsed correctly). Additionally, it could be that you might need to transformparams.genome
to a path i.e.Channel.fromPath(params.genome)
.I agree with @dgtool. The channel is probably empty, so the process can't instantiate a task. add
checkIfExists: true
to yourfromFileParis
call.Something like:
I added the checkIfExists, the output is exactly same. The STAR_MAP is not running at all. When the script failed to read fasta files, there was different error where the process started but threw an error. Sorry for the late reply
Hm.. I see. Well, this behavior indicates that inputs are missing. There's not much more we can do without a minimal reproducible example. Maybe sharing the
nextflow.log
file will help (try running the pipeline again withNXF_TRACE=nextflow
before the command, e.g.NXF_TRACE=nextflow nextflow run nf-core/rnaseq -profile test,docker --outdir results
so that we can get a more verbose log file).Converting the genome to an actual channel was also my first hunch, but he doesn't do it for
STAR_INDEX
and there it works just fine.How does the the run end? Do you get an error or does it just hang? Firstly, I think your
reads.params
is not appropriate fromfromFilePairs
. At least in DSL1, you needed to specify how file pairs are meant to be identified. For example,.fromFilePairs("${params.InDir}/*_{R1,R2}.fastq.gz")
would then create a tuple with the structure[id, [id_R1.fastq.gz, id_R2.fastq.gz]]
.Sorry for the late reply. The script runs without any error.
This part looks weird to me with the brackets. Often you need to run the code block in shell: mode when dealing with special characters, eg using
sed 's/bla/etc/g'
Comment out and see if it starts ? In multiple years of daily NF I've never had processes just hang so it is a weird one.
Thanks for the reply, it is not reading the fasta files. I can't figure out the reason why.
Instead of
params.reads = "${launchDir}/data/*.fastq"
try
params.reads = "${launchDir}/data/*_{1,2}.fastq"
That worked, thanks a lot. Sorry for inconvenience