Nextflow process not executing
0
0
Entering edit mode
5 months ago
myoui3122010 ▴ 20

New to Nextflow, the process to map using STAR is not executing.

Modules.nf

process STAR_INDEX {
tag "$genome.baseName"

input:
  path genome
  output:
  path "genome_dir"

  script:
  """
  mkdir genome_dir

  STAR --runMode genomeGenerate \
       --genomeDir genome_dir \
       --genomeFastaFiles ${genome} \
       --runThreadN ${task.cpus}

   """
}

process STAR_ALIGN {
tag "$replicateId"

input: 
  path genome
  path genomeDir
  tuple val(replicateId), path(reads) 

output: 
tuple \
  val(replicateId), \
  path('Aligned.sortedByCoord.uniq.bam'), \
  path('Aligned.sortedByCoord.uniq.bam.bai')

script:
"""
# ngs-nf-dev Align reads to genome
STAR --genomeDir $genomeDir \
   --readFilesIn $reads \
   --runThreadN $task.cpus \
   --readFilesCommand zcat \
   --outFilterType BySJout \
   --alignSJoverhangMin 8 \
   --alignSJDBoverhangMin 1 \
   --outFilterMismatchNmax 999

# Run 2-pass mapping (improve alignmets using table of splice junctions and create a new index)  
STAR --genomeDir $genomeDir \
   --readFilesIn $reads \
   --runThreadN $task.cpus \
   --readFilesCommand zcat \
   --outFilterType BySJout \
   --alignSJoverhangMin 8 \
   --alignSJDBoverhangMin 1 \
   --outFilterMismatchNmax 999 \
   --sjdbFileChrStartEnd SJ.out.tab \
   --outSAMtype BAM SortedByCoordinate \
   --outSAMattrRGline ID:$replicateId LB:library PL:illumina PU:machine SM:GM12878

# Select only unique alignments, no multimaps
(samtools view -H Aligned.sortedByCoord.out.bam; samtools view Aligned.sortedByCoord.out.bam| 
grep -w 'NH:i:1') \
|samtools view -Sb - > Aligned.sortedByCoord.uniq.bam

 # Index the BAM file
samtools index Aligned.sortedByCoord.uniq.bam
"""
}

rnaseq.nf

nextflow.enable.dsl=2

params.reads = "${launchDir}/data/*.fastq"
params.genome = "${launchDir}/genome/genome.fna"

include {
   STAR_INDEX;
   STAR_ALIGN;} from './modules.nf'

workflow {

    reads_ch = Channel.fromFilePairs(params.reads)

    // Execute processes in the workflow
    STAR_INDEX(params.genome)
    STAR_ALIGN(params.genome, STAR_INDEX.out, reads_ch)
 }![enter image description here][1]

Bash Output

STAR-mapping. Nextflow • 1.0k views
ADD COMMENT
2
Entering edit mode

Do a quick .view() on STAR_INDEX.out to confirm that this channel is being populated.

ADD REPLY
0
Entering edit mode

gSTAR_INDEX.out runs properly and the index files are generated properly.

Sorry for the late reply

ADD REPLY
1
Entering edit mode

I guess one thing to possibly try out would be to check whether reads_ch actually reads in the files correctly (so just something like reads_ch.view() would be enough to check if it actually has the files parsed correctly). Additionally, it could be that you might need to transform params.genome to a path i.e. Channel.fromPath(params.genome).

ADD REPLY
1
Entering edit mode

I agree with @dgtool. The channel is probably empty, so the process can't instantiate a task. add checkIfExists: true to your fromFileParis call.

Something like:

reads_ch = Channel.fromFilePairs(params.reads, checkIfExists: true)
ADD REPLY
0
Entering edit mode

I added the checkIfExists, the output is exactly same. The STAR_MAP is not running at all. When the script failed to read fasta files, there was different error where the process started but threw an error. Sorry for the late reply

ADD REPLY
1
Entering edit mode

Hm.. I see. Well, this behavior indicates that inputs are missing. There's not much more we can do without a minimal reproducible example. Maybe sharing the nextflow.log file will help (try running the pipeline again with NXF_TRACE=nextflow before the command, e.g. NXF_TRACE=nextflow nextflow run nf-core/rnaseq -profile test,docker --outdir results so that we can get a more verbose log file).

ADD REPLY
0
Entering edit mode

Converting the genome to an actual channel was also my first hunch, but he doesn't do it for STAR_INDEX and there it works just fine.

ADD REPLY
0
Entering edit mode

How does the the run end? Do you get an error or does it just hang? Firstly, I think your reads.params is not appropriate from fromFilePairs. At least in DSL1, you needed to specify how file pairs are meant to be identified. For example, .fromFilePairs("${params.InDir}/*_{R1,R2}.fastq.gz") would then create a tuple with the structure [id, [id_R1.fastq.gz, id_R2.fastq.gz]].

ADD REPLY
0
Entering edit mode

Sorry for the late reply. The script runs without any error.

ADD REPLY
1
Entering edit mode

This part looks weird to me with the brackets. Often you need to run the code block in shell: mode when dealing with special characters, eg using sed 's/bla/etc/g'

(samtools view -H Aligned.sortedByCoord.out.bam; samtools view Aligned.sortedByCoord.out.bam| 
grep -w 'NH:i:1') \

Comment out and see if it starts ? In multiple years of daily NF I've never had processes just hang so it is a weird one.

ADD REPLY
0
Entering edit mode

Thanks for the reply, it is not reading the fasta files. I can't figure out the reason why.

ADD REPLY
2
Entering edit mode

Instead of

params.reads = "${launchDir}/data/*.fastq"

try

params.reads = "${launchDir}/data/*_{1,2}.fastq"

ADD REPLY
1
Entering edit mode

That worked, thanks a lot. Sorry for inconvenience

ADD REPLY

Login before adding your answer.

Traffic: 2939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6