how to pass trimmed fastP output to bwa in Nextflow
2
0
Entering edit mode
2.2 years ago

Hi Everyone!!

I am trying to pass the FASTP trimmed files to BWAMEM process in nextflow and for some reason after running the code mentioned below (dsl2), I only get one bam file. The FASTP runs on all 24 files but BWAMEM process runs only on single file. I don't understand what's wrong. I am trying to imitate this pipeline


params.memory = "3g"
params.cpus = 1
params.output = "."

process FASTP{
    cpus params.cpus
    memory params.memory
    publishDir "${params.output}/02_adapterTrimming", mode: 'copy'

    input:
        tuple val(sid), path(reads)

    output:
        tuple val(sid), file(fq_1_paired), file(fq_2_paired), emit: trimmed_reads
                file("${sid}.fastp_stats.json")
                file("${sid}.fastp_stats.html")

        script:
    fq_1_paired = sid + '_R1_P.fastq.gz'
    fq_2_paired = sid + '_R2_P.fastq.gz'
    """
    fastp \
    --in1 ${reads[0]} \
    --in2 ${reads[1]}\
    --out1 $fq_1_paired \
    --out2 $fq_2_paired \
    --json ${sid}.fastp_stats.json \
    --html ${sid}.fastp_stats.html
    """
}

process BWAMEM{
        publishDir "${params.output}/03_alignment", mode: 'copy'
        memory params.memory
        cpus params.cpus

        input:
        tuple val(sid), file(reads1), file(reads2)
        val(reference)

        output:
        path "*.sorted.bam", emit: alignments
        path "*.bai"

        shell:
        '''
        ref=$(echo !{reference} | sed -e 's/\\.[^.]*$//')
        id=$(zcat !{reads1} | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
        bwa mem -M -R "$(echo "@RG\\tID:${id}\\tSM:!{sid}\\tPL:ILLUMINA")" -t !{task.cpus} ${ref} !{reads1} !{reads2} | samtools sort -@ !{task.cpus} -o !{sid}.sorted.bam -
        samtools index -@ !{task.cpus} !{sid}.sorted.bam
        '''
}

and the workflow section is

if (params.input != false) {
            Channel.fromFilePairs(params.input, checkIfExists: true )
                .set { input_fastqs }
        }
workflow{
    reference_ch=BWAINDEX.out.bwa_idx.flatten().filter(~/.*fai/)
    FASTP(input_fastqs)
    BWAMEM(FASTP.out[0], reference_ch)
}

I tried BWAMEM(FASTP.out.trimmed_reads.groupTuple(), reference_ch) as well but it's aligning a single sample

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [pedantic_montalcini] - revision: aeba1fa55a
executor >  local (26)
[11/767e8f] process > BWAINDEX   [100%] 1 of 1 ✔
[96/644b6d] process > FASTP (24) [100%] 24 of 24 ✔
[97/2a2e3d] process > BWAMEM (1) [100%] 1 of 1 ✔
fastp bwa nextflow • 2.1k views
ADD COMMENT
3
Entering edit mode
2.2 years ago
Maxime Garcia ▴ 350

Hi,

Your issue is actually not the trim reads, but the bwa index. Can you try with adding .collect() like this:

reference_ch=BWAINDEX.out.bwa_idx.flatten().filter(~/.*fai/).collect()

Basically the issue was due to the fact that the channel was a queue channel and was getting consumed, with only 1 item inside, you only get one run. The .collect() operator should help it make it out as a value channel. cf https://www.nextflow.io/docs/latest/operator.html#collect

ADD COMMENT
0
Entering edit mode

It works. However, now it gives me the following error. Is there a way to apply toRealPath() with basename?

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [high_ardinghelli] - revision: 86556531ff
executor >  local (49)
[c3/7283af] process > BWAINDEX   [100%] 1 of 1 ✔
[b8/06c905] process > FASTP (23) [100%] 24 of 24 ✔
[aa/f1fc0f] process > BWAMEM (6) [  0%] 0 of 24
[/home/subudhak/Documents/COVID_Project/nextCov/work/c3/7283afa8ec4bf917d966c37ec390f9/Sars_cov_2.ASM985889v3.dna.toplevel.fa.fai]
Error executing process > 'BWAMEM (5)'

Caused by:
  Process `BWAMEM (5)` terminated with an error exit status (1)

Command executed:


  id=$(zcat 21_S21_L001_R1_P.fastq.gz | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
  bwa mem -M -R "$(echo "@RG\tID:${id}\tSM:21_S21_L001\tPL:ILLUMINA")" -t 1 Sars_cov_2.ASM985889v3.dna.toplevel.fa 21_S21_L001_R1_P.fastq.gz 21_S21_L001_R2_P.fastq.gz | s
amtools sort -@ 1 -o 21_S21_L001.sorted.bam -
  samtools index -@ 1 21_S21_L001.sorted.bam

Command exit status:
  1

Command output:
  (empty)
executor >  local (49)
[c3/7283af] process > BWAINDEX    [100%] 1 of 1 ✔
[b8/06c905] process > FASTP (23)  [100%] 24 of 24 ✔
[ba/be5a18] process > BWAMEM (15) [100%] 23 of 23, failed: 23
[/home/subudhak/Documents/COVID_Project/nextCov/work/c3/7283afa8ec4bf917d966c37ec390f9/Sars_cov_2.ASM985889v3.dna.toplevel.fa.fai]
Error executing process > 'BWAMEM (5)'

Command error:
  [E::bwa_idx_load_from_disk] fail to locate the index files
  samtools sort: failed to read header from "-"

Work dir:
  /home/subudhak/Documents/COVID_Project/nextCov/work/37/36a0ef6525b44c33e678e87e8cc517

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
ADD REPLY
0
Entering edit mode

Thanks I resolved this issue with emitting .fa file directly and using toRealPath

ADD REPLY
2
Entering edit mode
2.2 years ago
Maxime Garcia ▴ 350

Any more information about what actually went wrong in the .command.log or .command.err in the work folder related to that particular task?

ADD COMMENT
1
Entering edit mode

The final code used was

params.memory = "3g"
params.cpus = 1
params.output = "."


process BWAINDEX{
        publishDir "${params.output}/00_index", mode: 'copy'
        memory params.memory
        cpus params.cpus

        input:
        path (fasta)

        output:
        path "$fasta", emit: bwa_idx        <--------------------Made changes here ######################
        path (fasta)

        script:
        """
        bwa index $fasta
        gatk CreateSequenceDictionary -R $fasta
        samtools faidx $fasta
        """
        }
process FASTP{
    cpus params.cpus
    memory params.memory
    publishDir "${params.output}/02_adapterTrimming", mode: 'copy'

    input:
        tuple val(sid), path(reads)

    output:
        tuple val(sid), file(fq_1_paired), file(fq_2_paired), emit: trimmed_reads
                file("${sid}.fastp_stats.json")
                file("${sid}.fastp_stats.html")

        script:
    fq_1_paired = sid + '_R1_P.fastq.gz'
    fq_2_paired = sid + '_R2_P.fastq.gz'
    """
    fastp \
    --in1 ${reads[0]} \
    --in2 ${reads[1]}\
    --out1 $fq_1_paired \
    --out2 $fq_2_paired \
    --json ${sid}.fastp_stats.json \
    --html ${sid}.fastp_stats.html
    """
}

process BWAMEM{
        publishDir "${params.output}/03_alignment", mode: 'copy'
        memory params.memory
        cpus params.cpus

        input:
        tuple val(sid), file(reads1), file(reads2)
        path(reference)                 <--------------------Made changes here ######################

        output:
        path "*.sorted.bam", emit: alignments
        path "*.bai"

        shell:
        '''
                                                                    <--------------------Made changes here (deleted this line) ######################
        id=$(zcat !{reads1} | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
        bwa mem -M -R "$(echo "@RG\\tID:${id}\\tSM:!{sid}\\tPL:ILLUMINA")" -t !{task.cpus} !{reference.toRealPath()} !{reads1} !{reads2} | samtools sort -@ !{task.cpus} -o !{sid}.sorted.bam -     <--------------------Made changes here ######################
        samtools index -@ !{task.cpus} !{sid}.sorted.bam
        '''
}

if (params.input != false) {
            Channel.fromFilePairs(params.input, checkIfExists: true )
                .set { input_fastqs }
        }
workflow{
    BWAINDEX(params.ref)
    FASTP(input_fastqs)
   BWAMEM(FASTP.out[0], BWAINDEX.out[0])        <--------------------Made changes here ######################
}
ADD REPLY

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6