Hello,
Is it possible to generate an identity score value between two sequences using Mauve? Or is there another software that can provide such value?
Thank you
Hello,
Is it possible to generate an identity score value between two sequences using Mauve? Or is there another software that can provide such value?
Thank you
FastANI
has been recommended by users here for this type of comparison (since you are mentioning Mauve
you are likely referring to genomes): https://github.com/ParBLiSS/FastANI
The nf-core/pairgenomealign pipeline, which I develop, reports a percent identity score for pairs of genome. In the current version (2.0.0) gaps are counted as mismatches but I plan to update it so that an alternative score ignoring gaps is also provided. The pipeline uses LAST as an aligner. If you run the last-train program directly or through the pipeline, you will also find an estimate of the percent similarity (ignoring gaps) in the trained parameter file that it outputs.
Thank you. I prepared the samplesheet.csv
file as
HoVa25_chr1, chr1_out.fa
Query_1, Valg_chr1_ref.fa
Query_2, Valg_V25_chr1_Cns.fa
opened the terminal in the folder containing all the files and ran nextflow run nf-core/pairgenomealign --target ./target_genome_file.fa --input ./samplesheet.csv --outdir ./results -profile docker
but I got the error:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* --input (./samplesheet.csv): Validation of file failed:
-> Entry 1: Missing required field(s): sample, fasta
-> Entry 2: Missing required field(s): sample, fast
What is the correct syntax?
Thank you, I tried but this time I got:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* --input (./samplesheet.csv): Validation of file failed:
-> Entry 1: Error for field 'fasta' (chr1_out.fa): the file or directory 'chr1_out.fa' does not exist (Fasta file for genomes must be provided, cannot contain spaces and must have extension `.fa`, `.fa.gz`, `.fna`, `.fna.gz`, `.fasta` or `.fasta.gz`)
I then created the file chr1_out.fa in the working folder but then I got:
[- ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[- ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[- ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[- ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[- ] NFC…IRGENOMEALIGN:ASSEMBLYSCAN -
[- ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[- ] NFC…ALIGN_M2O:ALIGNMENT_LASTDB -
[- ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[- ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[- ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[- ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
ERROR ~ No such file or directory: /home/gigiux/Downloads/ALIGN/target_genome_file.fa
-- Check script '/home/gigiux/.nextflow/assets/nf-core/pairgenomealign/main.nf' at line: 83 or see '.nextflow.log' file for more details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
-[nf-core/pairgenomealign] Pipeline completed with errors-
I created also target_genome_file.fa
but now I got the error:
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB (target)'
Caused by:
Process requirement exceeds available CPUs -- req: 6; avail: 4
Command executed:
mkdir lastdb
lastdb \
-R01 -c -uYASS -S2 \
-P 6 \
lastdb/target \
target_genome_file.fa
cat <<-END_VERSIONS > versions.yml
"NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB":
last: $(lastdb --version 2>&1 | sed 's/lastdb //')
END_VERSIONS
Command exit status:
-
Command output:
(empty)
Work dir:
/home/gigiux/Downloads/ALIGN/work/07/20c49e14a447e696f1bbf5c31dd86f
Container:
community.wave.seqera.io/library/last:1608--f41c047f7dc37e30
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
Yes, you can make a nf.conf
local file and include it with -c nf.conf
, taking example on the following profile from the default configs:
gitpod {
executor.name = 'local'
executor.cpus = 4
executor.memory = 8.GB
process {
resourceLimits = [
memory: 8.GB,
cpus : 4,
time : 1.h
]
}
}
https://github.com/nf-core/pairgenomealign/blob/2.0.0/nextflow.config#L172-L183
Or if this profile happens to fit your needs well, you can include with the -p
option. You probably still need docker, so that would be -p docker,gitput
. (not tested)
I finally got access to a more powerful machine, but this time I got the error:
$ ~/src/Nextflow/nextflow run nf-core/pairgenomealign ./target_genome_file.fa --input ./samplesheet.csv --outdir ./results -profile docker
N E X T F L O W ~ version 25.05.0-edge
Launching `https://github.com/nf-core/pairgenomealign` [thirsty_escher] DSL2 - revision: 0005fc64fc [master]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/pairgenomealign 2.2.0dev
------------------------------------------------------
Input/output options
input : ./samplesheet.csv
outdir : ./results
Alignment options
last_split_mismap : 1e-05
Generic options
trace_report_suffix: 2025-06-08_10-25-29
Core Nextflow options
revision : master
runName : thirsty_escher
containerEngine : docker
launchDir : /home/gigiux/Downloads/ALIGN
workDir : /home/gigiux/Downloads/ALIGN/work
projectDir : /home/gigiux/.nextflow/assets/nf-core/pairgenomealign
userName : gigiux
profile : docker
configFiles :
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/pairgenomealign/blob/master/CITATIONS.md
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* Missing required parameter(s): target
-- Check script '/home/gigiux/.nextflow/assets/nf-core/pairgenomealign/subworkflows/nf-core/utils_nfschema_plugin/main.nf' at line: 39 or see '.nextflow.log' file for more details
What is the parameter target
missing?
Thank you
Usage section for the pipeline: https://nf-co.re/pairgenomealign/2.1.0/docs/usage/
You need to specify a "target" genome using the parameter --target
, which in your case is the target_genome_file.fa
(or whatever the real name is).
Yes, there was the --target
parameter missing. I tried again, this time with a samplesheet.csv
file containing only
sample, fasta
Query_1, Valg_chr1_ref.fa
(I am essentially comparing Valg_V25_chr1_Cns.fa
to Valg_chr1_ref.fa
) but I got:
$ ~/src/Nextflow/nextflow run nf-core/pairgenomealign --target ./Valg_V25_chr1_Cns.fa --input ./samplesheet.csv --outdir ./results -profile docker
N E X T F L O W ~ version 25.05.0-edge
Launching `https://github.com/nf-core/pairgenomealign` [zen_goldberg] DSL2 - revision: 0005fc64fc [master]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/pairgenomealign 2.2.0dev
------------------------------------------------------
Input/output options
input : ./samplesheet.csv
target : ./Valg_V25_chr1_Cns.fa
outdir : ./results
Alignment options
last_split_mismap : 1e-05
Generic options
trace_report_suffix: 2025-06-09_07-02-32
Core Nextflow options
revision : master
runName : zen_goldberg
containerEngine : docker
launchDir : /home/gigiux/Downloads/ALIGN
workDir : /home/gigiux/Downloads/ALIGN/work
projectDir : /home/gigiux/.nextflow/assets/nf-core/pairgenomealign
userName : gigiux
profile : docker
configFiles :
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/pairgenomealign/blob/master/CITATIONS.md
executor > local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[- ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[- ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[- ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[- ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[- ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'
Caused by:
Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)
Command executed:
seqtk \
cutN \
-n 10 -p 100000 \
-g Valg_V25_chr1_Cns.fa \
> targetGenome.bed
cat <<-END_VERSIONS > versions.yml
"NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
executor > local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1 x
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[- ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[- ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[- ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[- ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[- ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'
Caused by:
Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)
Command executed:
seqtk \
cutN \
-n 10 -p 100000 \
-g Valg_V25_chr1_Cns.fa \
> targetGenome.bed
cat <<-END_VERSIONS > versions.yml
"NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
seqtk: $(echo $(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*$//')
END_VERSIONS
Command exit status:
executor > local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1 x
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[- ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[- ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[- ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[- ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[- ] NFC…GN:PAIRGENOMEALIGN:MULTIQC | 0 of 1
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/pairgenomealign] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'
Caused by:
Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)
Command executed:
seqtk \
cutN \
-n 10 -p 100000 \
-g Valg_V25_chr1_Cns.fa \
> targetGenome.bed
cat <<-END_VERSIONS > versions.yml
"NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
seqtk: $(echo $(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*$//')
END_VERSIONS
Command exit status:
executor > local (4)
[91/dc3644] NFC…CUTN_TARGET (targetGenome) | 0 of 1 x
[79/83afe9] NFC…ALIGN:CUTN_QUERY (Query_1) | 0 of 1 x
[36/98987c] NFC…IGN:ASSEMBLYSCAN (Query_1) | 0 of 1 x
[- ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[6b/b049b8] NFC…:ALIGNMENT_LASTDB (target) | 0 of 1 x
[- ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[- ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[- ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[- ] NFC…GN:PAIRGENOMEALIGN:MULTIQC | 0 of 1
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/pairgenomealign] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)'
Caused by:
Process `NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET (targetGenome)` terminated with an error exit status (127)
Command executed:
seqtk \
cutN \
-n 10 -p 100000 \
-g Valg_V25_chr1_Cns.fa \
> targetGenome.bed
cat <<-END_VERSIONS > versions.yml
"NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:CUTN_TARGET":
seqtk: $(echo $(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*$//')
END_VERSIONS
Command exit status:
127
Command output:
(empty)
Command error:
.command.run: line 304: docker: command not found
Work dir:
/home/gigiux/Downloads/ALIGN/work/91/dc3644158a6a35e027a160fa96efcd
Container:
quay.io/biocontainers/seqtk:1.4--he4a0461_1
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
What went wrong this time? Thank you
.command.run: line 304: docker: command not found
: You need either to install the software by yourself, or to let Nextflow download them for you using conda
, docker
or singularity
. Here you specified -profile docker
, and the pipeline tries to use it, however it appears to not be installed.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
According to the manual a file is supposed to be produced by Mauve: https://darlinglab.org/mauve/user-guide/files.html#:~:text=The%20identity%20matrix%20file,every%20homologous%20nucleotide%20was%20identical.
Thank you, I ran with
progressiveMauve --weight=15 --output=./out_file.mauve chr1.fa
wherechr1.fa
is a multifasta with the two genomes to align. The output isout_file.mauve, out_file.mauve.backbone, out_file.mauve.bbcols
but they don't provide identity score. I think I am missing an argument...Perhaps the matrix generation requires more than two input genomes. Can you try providing one of the genomes two times in your input file?
I am not sure I can attach such a large file; would the ID of the genome be the same? NZ_CP184833.1 (https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP184833.1). Thank you
What I was thinking that you could duplicate one of the genomes in this file (and give it a dummy name) and then see if
mauve
produces a percent identity file (when there are 3 genomes in the file).I see, thanks I ran mauve with 3 files input and I got
out_file.mauve.bbcols, out_file.mauve, out_file.mauve.backbone
but none with the identity value...