Hello,
Is it possible to generate an identity score value between two sequences using Mauve? Or is there another software that can provide such value?
Thank you
Hello,
Is it possible to generate an identity score value between two sequences using Mauve? Or is there another software that can provide such value?
Thank you
FastANI
has been recommended by users here for this type of comparison (since you are mentioning Mauve
you are likely referring to genomes): https://github.com/ParBLiSS/FastANI
The nf-core/pairgenomealign pipeline, which I develop, reports a percent identity score for pairs of genome. In the current version (2.0.0) gaps are counted as mismatches but I plan to update it so that an alternative score ignoring gaps is also provided. The pipeline uses LAST as an aligner. If you run the last-train program directly or through the pipeline, you will also find an estimate of the percent similarity (ignoring gaps) in the trained parameter file that it outputs.
Thank you. I prepared the samplesheet.csv
file as
HoVa25_chr1, chr1_out.fa
Query_1, Valg_chr1_ref.fa
Query_2, Valg_V25_chr1_Cns.fa
opened the terminal in the folder containing all the files and ran nextflow run nf-core/pairgenomealign --target ./target_genome_file.fa --input ./samplesheet.csv --outdir ./results -profile docker
but I got the error:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* --input (./samplesheet.csv): Validation of file failed:
-> Entry 1: Missing required field(s): sample, fasta
-> Entry 2: Missing required field(s): sample, fast
What is the correct syntax?
Thank you, I tried but this time I got:
ERROR ~ Validation of pipeline parameters failed!
-- Check '.nextflow.log' file for details
The following invalid input values have been detected:
* --input (./samplesheet.csv): Validation of file failed:
-> Entry 1: Error for field 'fasta' (chr1_out.fa): the file or directory 'chr1_out.fa' does not exist (Fasta file for genomes must be provided, cannot contain spaces and must have extension `.fa`, `.fa.gz`, `.fna`, `.fna.gz`, `.fasta` or `.fasta.gz`)
I then created the file chr1_out.fa in the working folder but then I got:
[- ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[- ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[- ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[- ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[- ] NFC…IRGENOMEALIGN:ASSEMBLYSCAN -
[- ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[- ] NFC…ALIGN_M2O:ALIGNMENT_LASTDB -
[- ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[- ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[- ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[- ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[- ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
ERROR ~ No such file or directory: /home/gigiux/Downloads/ALIGN/target_genome_file.fa
-- Check script '/home/gigiux/.nextflow/assets/nf-core/pairgenomealign/main.nf' at line: 83 or see '.nextflow.log' file for more details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
-[nf-core/pairgenomealign] Pipeline completed with errors-
I created also target_genome_file.fa
but now I got the error:
ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB (target)'
Caused by:
Process requirement exceeds available CPUs -- req: 6; avail: 4
Command executed:
mkdir lastdb
lastdb \
-R01 -c -uYASS -S2 \
-P 6 \
lastdb/target \
target_genome_file.fa
cat <<-END_VERSIONS > versions.yml
"NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB":
last: $(lastdb --version 2>&1 | sed 's/lastdb //')
END_VERSIONS
Command exit status:
-
Command output:
(empty)
Work dir:
/home/gigiux/Downloads/ALIGN/work/07/20c49e14a447e696f1bbf5c31dd86f
Container:
community.wave.seqera.io/library/last:1608--f41c047f7dc37e30
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
Yes, you can make a nf.conf
local file and include it with -c nf.conf
, taking example on the following profile from the default configs:
gitpod {
executor.name = 'local'
executor.cpus = 4
executor.memory = 8.GB
process {
resourceLimits = [
memory: 8.GB,
cpus : 4,
time : 1.h
]
}
}
https://github.com/nf-core/pairgenomealign/blob/2.0.0/nextflow.config#L172-L183
Or if this profile happens to fit your needs well, you can include with the -p
option. You probably still need docker, so that would be -p docker,gitput
. (not tested)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
According to the manual a file is supposed to be produced by Mauve: https://darlinglab.org/mauve/user-guide/files.html#:~:text=The%20identity%20matrix%20file,every%20homologous%20nucleotide%20was%20identical.
Thank you, I ran with
progressiveMauve --weight=15 --output=./out_file.mauve chr1.fa
wherechr1.fa
is a multifasta with the two genomes to align. The output isout_file.mauve, out_file.mauve.backbone, out_file.mauve.bbcols
but they don't provide identity score. I think I am missing an argument...Perhaps the matrix generation requires more than two input genomes. Can you try providing one of the genomes two times in your input file?
I am not sure I can attach such a large file; would the ID of the genome be the same? NZ_CP184833.1 (https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP184833.1). Thank you
What I was thinking that you could duplicate one of the genomes in this file (and give it a dummy name) and then see if
mauve
produces a percent identity file (when there are 3 genomes in the file).I see, thanks I ran mauve with 3 files input and I got
out_file.mauve.bbcols, out_file.mauve, out_file.mauve.backbone
but none with the identity value...