Question

How to generate identiy score between aligned sequences with Mauve?

0

Entering edit mode

11 weeks ago

marongiu.luigi ▴ 750

Hello,

Is it possible to generate an identity score value between two sequences using Mauve? Or is there another software that can provide such value?

Thank you

identity score alignment mauve • 1.5k views

ADD COMMENT • link updated 5 weeks ago by Charles Plessy ★ 2.9k • written 11 weeks ago by marongiu.luigi ▴ 750

0

Entering edit mode

According to the manual a file is supposed to be produced by Mauve: https://darlinglab.org/mauve/user-guide/files.html#:~:text=The%20identity%20matrix%20file,every%20homologous%20nucleotide%20was%20identical.

ADD REPLY • link 11 weeks ago by GenoMax 152k

0

Entering edit mode

Thank you, I ran with progressiveMauve --weight=15 --output=./out_file.mauve chr1.fa where chr1.fa is a multifasta with the two genomes to align. The output is out_file.mauve, out_file.mauve.backbone, out_file.mauve.bbcols but they don't provide identity score. I think I am missing an argument...

ADD REPLY • link 11 weeks ago by marongiu.luigi ▴ 750

0

Entering edit mode

Perhaps the matrix generation requires more than two input genomes. Can you try providing one of the genomes two times in your input file?

ADD REPLY • link 11 weeks ago by GenoMax 152k

0

Entering edit mode

I am not sure I can attach such a large file; would the ID of the genome be the same? NZ_CP184833.1 (https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP184833.1). Thank you

ADD REPLY • link 11 weeks ago by marongiu.luigi ▴ 750

0

Entering edit mode

where chr1.fa is a multifasta with the two genomes to align.

What I was thinking that you could duplicate one of the genomes in this file (and give it a dummy name) and then see if mauve produces a percent identity file (when there are 3 genomes in the file).

ADD REPLY • link 11 weeks ago by GenoMax 152k

0

Entering edit mode

I see, thanks I ran mauve with 3 files input and I got out_file.mauve.bbcols, out_file.mauve, out_file.mauve.backbone but none with the identity value...

ADD REPLY • link 10 weeks ago by marongiu.luigi ▴ 750

score 0 · Answer 1 · 2025-04-27

0

Entering edit mode

11 weeks ago

GenoMax 152k

FastANI has been recommended by users here for this type of comparison (since you are mentioning Mauve you are likely referring to genomes): https://github.com/ParBLiSS/FastANI

ADD COMMENT • link 11 weeks ago by GenoMax 152k

0

Entering edit mode

Thank you, I used [skani][1] but it requires the build a model of several genomes together. Can I use it for only two genomes?

ADD REPLY • link 11 weeks ago by marongiu.luigi ▴ 750

score 0 · Answer 2 · 2025-04-27

0

Entering edit mode

11 weeks ago

Charles Plessy ★ 2.9k

The nf-core/pairgenomealign pipeline, which I develop, reports a percent identity score for pairs of genome. In the current version (2.0.0) gaps are counted as mismatches but I plan to update it so that an alternative score ignoring gaps is also provided. The pipeline uses LAST as an aligner. If you run the last-train program directly or through the pipeline, you will also find an estimate of the percent similarity (ignoring gaps) in the trained parameter file that it outputs.

ADD COMMENT • link 11 weeks ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Thank you. I prepared the samplesheet.csv file as

HoVa25_chr1, chr1_out.fa
Query_1, Valg_chr1_ref.fa
Query_2, Valg_V25_chr1_Cns.fa

opened the terminal in the folder containing all the files and ran nextflow run nf-core/pairgenomealign --target ./target_genome_file.fa --input ./samplesheet.csv --outdir ./results -profile dockerbut I got the error:

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --input (./samplesheet.csv): Validation of file failed:
    -> Entry 1: Missing required field(s): sample, fasta
    -> Entry 2: Missing required field(s): sample, fast

What is the correct syntax?

ADD REPLY • link 10 weeks ago by marongiu.luigi ▴ 750

1

Entering edit mode

Looks like a file header is required for the csv file (as below). Try this file.

sample,fasta
HoVa25_chr1,chr1_out.fa
Query_1,Valg_chr1_ref.fa
Query_2,Valg_V25_chr1_Cns.fa

ADD REPLY • link 10 weeks ago by GenoMax 152k

0

Entering edit mode

Thank you, I tried but this time I got:

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --input (./samplesheet.csv): Validation of file failed:
    -> Entry 1: Error for field 'fasta' (chr1_out.fa): the file or directory 'chr1_out.fa' does not exist (Fasta file for genomes must be provided, cannot contain spaces and must have extension `.fa`, `.fa.gz`, `.fna`, `.fna.gz`, `.fasta` or `.fasta.gz`)

I then created the file chr1_out.fa in the working folder but then I got:

[-        ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[-        ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[-        ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[-        ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[-        ] NFC…IRGENOMEALIGN:ASSEMBLYSCAN -
[-        ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[-        ] NFC…ALIGN_M2O:ALIGNMENT_LASTDB -
[-        ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[-        ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[-        ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[-        ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
ERROR ~ No such file or directory: /home/gigiux/Downloads/ALIGN/target_genome_file.fa

 -- Check script '/home/gigiux/.nextflow/assets/nf-core/pairgenomealign/main.nf' at line: 83 or see '.nextflow.log' file for more details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details
-[nf-core/pairgenomealign] Pipeline completed with errors-

ADD REPLY • link 10 weeks ago by marongiu.luigi ▴ 750

0

Entering edit mode

ERROR ~ No such file or directory: /home/gigiux/Downloads/ALIGN/target_genome_file.fa

Looks like the error is now with a different file.

ADD REPLY • link 10 weeks ago by GenoMax 152k

0

Entering edit mode

I created also target_genome_file.fa but now I got the error:

ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB (target)'

Caused by:
  Process requirement exceeds available CPUs -- req: 6; avail: 4


Command executed:

  mkdir lastdb
  lastdb \
      -R01 -c -uYASS -S2 \
      -P 6 \
      lastdb/target \
      target_genome_file.fa

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB":
      last: $(lastdb --version 2>&1 | sed 's/lastdb //')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /home/gigiux/Downloads/ALIGN/work/07/20c49e14a447e696f1bbf5c31dd86f

Container:
  community.wave.seqera.io/library/last:1608--f41c047f7dc37e30

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

ADD REPLY • link 10 weeks ago by marongiu.luigi ▴ 750

1

Entering edit mode

Process requirement exceeds available CPUs -- req: 6; avail: 4

You will need to adjust hardware config

ADD REPLY • link 10 weeks ago by GenoMax 152k

0

Entering edit mode

Yes, you can make a nf.conf local file and include it with -c nf.conf, taking example on the following profile from the default configs:

    gitpod {
        executor.name           = 'local'
        executor.cpus           = 4
        executor.memory         = 8.GB
        process {
            resourceLimits = [
                memory: 8.GB,
                cpus  : 4,
                time  : 1.h
            ]
        }
    }

https://github.com/nf-core/pairgenomealign/blob/2.0.0/nextflow.config#L172-L183

Or if this profile happens to fit your needs well, you can include with the -p option. You probably still need docker, so that would be -p docker,gitput. (not tested)

ADD REPLY • link 10 weeks ago by Charles Plessy ★ 2.9k

0

Entering edit mode

I finally got access to a more powerful machine, but this time I got the error:

$ ~/src/Nextflow/nextflow run nf-core/pairgenomealign ./target_genome_file.fa --input ./samplesheet.csv --outdir ./results -profile docker

 N E X T F L O W   ~  version 25.05.0-edge

Launching `https://github.com/nf-core/pairgenomealign` [thirsty_escher] DSL2 - revision: 0005fc64fc [master]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/pairgenomealign 2.2.0dev
------------------------------------------------------
Input/output options
  input              : ./samplesheet.csv
  outdir             : ./results

Alignment options
  last_split_mismap  : 1e-05

Generic options
  trace_report_suffix: 2025-06-08_10-25-29

Core Nextflow options
  revision           : master
  runName            : thirsty_escher
  containerEngine    : docker
  launchDir          : /home/gigiux/Downloads/ALIGN
  workDir            : /home/gigiux/Downloads/ALIGN/work
  projectDir         : /home/gigiux/.nextflow/assets/nf-core/pairgenomealign
  userName           : gigiux
  profile            : docker
  configFiles        : 

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The nf-core framework
    https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
    https://github.com/nf-core/pairgenomealign/blob/master/CITATIONS.md

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* Missing required parameter(s): target

 -- Check script '/home/gigiux/.nextflow/assets/nf-core/pairgenomealign/subworkflows/nf-core/utils_nfschema_plugin/main.nf' at line: 39 or see '.nextflow.log' file for more details

What is the parameter target missing?

Thank you

ADD REPLY • link 5 weeks ago by marongiu.luigi ▴ 750

0

Entering edit mode

Usage section for the pipeline: https://nf-co.re/pairgenomealign/2.1.0/docs/usage/

You need to specify a "target" genome using the parameter --target, which in your case is the target_genome_file.fa (or whatever the real name is).

ADD REPLY • link 5 weeks ago by GenoMax 152k

0

Entering edit mode

Yes, there was the --target parameter missing. I tried again, this time with a samplesheet.csv file containing only

sample, fasta
Query_1, Valg_chr1_ref.fa

(I am essentially comparing Valg_V25_chr1_Cns.fa to Valg_chr1_ref.fa) but I got:

$ ~/src/Nextflow/nextflow run nf-core/pairgenomealign --target ./Valg_V25_chr1_Cns.fa --input ./samplesheet.csv --outdir ./results -profile docker

 N E X T F L O W   ~  version 25.05.0-edge

Launching `https://github.com/nf-core/pairgenomealign` [zen_goldberg] DSL2 - revision: 0005fc64fc [master]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/pairgenomealign 2.2.0dev
------------------------------------------------------
Input/output options
  input              : ./samplesheet.csv
  target             : ./Valg_V25_chr1_Cns.fa
  outdir             : ./results

Alignment options
  last_split_mismap  : 1e-05

Generic options
  trace_report_suffix: 2025-06-09_07-02-32

Core Nextflow options
  revision           : master
  runName            : zen_goldberg
  containerEngine    : docker
  launchDir          : /home/gigiux/Downloads/ALIGN
  workDir            : /home/gigiux/Downloads/ALIGN/work
  projectDir         : /home/gigiux/.nextflow/assets/nf-core/pairgenomealign
  userName           : gigiux
  profile            : docker
  configFiles        : 

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The nf-core framework
    https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
    https://github.com/nf-core/pairgenomealign/blob/master/CITATIONS.md

executor >  local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[-        ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[-        ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[-        ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[-        ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[-        ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'

Caused by:
  Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)


Command executed:

  seqtk \
      cutN \
      -n 10 -p 100000 \
      -g Valg_V25_chr1_Cns.fa \
      > targetGenome.bed

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
executor >  local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1 x
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[-        ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[-        ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[-        ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[-        ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[-        ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'

Caused by:
  Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)


Command executed:

  seqtk \
      cutN \
      -n 10 -p 100000 \
      -g Valg_V25_chr1_Cns.fa \
      > targetGenome.bed

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
      seqtk: $(echo $(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*$//')
  END_VERSIONS

Command exit status:
executor >  local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1 x
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[-        ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[-        ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[-        ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[-        ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[-        ] NFC…GN:PAIRGENOMEALIGN:MULTIQC | 0 of 1
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/pairgenomealign] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'

Caused by:
  Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)


Command executed:

  seqtk \
      cutN \
      -n 10 -p 100000 \
      -g Valg_V25_chr1_Cns.fa \
      > targetGenome.bed

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
      seqtk: $(echo $(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*$//')
  END_VERSIONS

Command exit status:
executor >  local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1 x
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[-        ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[-        ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[-        ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[-        ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[-        ] NFC…GN:PAIRGENOMEALIGN:MULTIQC | 0 of 1
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/pairgenomealign] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'

Caused by:
  Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)


Command executed:

  seqtk \
      cutN \
      -n 10 -p 100000 \
      -g Valg_V25_chr1_Cns.fa \
      > targetGenome.bed

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
      seqtk: $(echo $(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*$//')
  END_VERSIONS

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.run: line 304: docker: command not found

Work dir:
  /home/gigiux/Downloads/ALIGN/work/91/dc3644158a6a35e027a160fa96efcd

Container:
  quay.io/biocontainers/seqtk:1.4--he4a0461_1

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

What went wrong this time? Thank you

ADD REPLY • link 5 weeks ago by marongiu.luigi ▴ 750

0

Entering edit mode

.command.run: line 304: docker: command not found: You need either to install the software by yourself, or to let Nextflow download them for you using conda, docker or singularity. Here you specified -profile docker, and the pipeline tries to use it, however it appears to not be installed.

ADD REPLY • link 5 weeks ago by Charles Plessy ★ 2.9k