How to generate identiy score between aligned sequences with Mauve?
2
0
Entering edit mode
13 days ago

Hello,

Is it possible to generate an identity score value between two sequences using Mauve? Or is there another software that can provide such value?

Thank you

identity score alignment mauve • 888 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you, I ran with progressiveMauve --weight=15 --output=./out_file.mauve chr1.fa where chr1.fa is a multifasta with the two genomes to align. The output is out_file.mauve, out_file.mauve.backbone, out_file.mauve.bbcols but they don't provide identity score. I think I am missing an argument...

ADD REPLY
0
Entering edit mode

Perhaps the matrix generation requires more than two input genomes. Can you try providing one of the genomes two times in your input file?

ADD REPLY
0
Entering edit mode

I am not sure I can attach such a large file; would the ID of the genome be the same? NZ_CP184833.1 (https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP184833.1). Thank you

ADD REPLY
0
Entering edit mode

where chr1.fa is a multifasta with the two genomes to align.

What I was thinking that you could duplicate one of the genomes in this file (and give it a dummy name) and then see if mauve produces a percent identity file (when there are 3 genomes in the file).

ADD REPLY
0
Entering edit mode

I see, thanks I ran mauve with 3 files input and I got out_file.mauve.bbcols, out_file.mauve, out_file.mauve.backbone but none with the identity value...

ADD REPLY
0
Entering edit mode
13 days ago
GenoMax 151k

FastANI has been recommended by users here for this type of comparison (since you are mentioning Mauve you are likely referring to genomes): https://github.com/ParBLiSS/FastANI

ADD COMMENT
0
Entering edit mode

Thank you, I used [skani][1] but it requires the build a model of several genomes together. Can I use it for only two genomes?

ADD REPLY
0
Entering edit mode
13 days ago
Charles Plessy ★ 2.9k

The nf-core/pairgenomealign pipeline, which I develop, reports a percent identity score for pairs of genome. In the current version (2.0.0) gaps are counted as mismatches but I plan to update it so that an alternative score ignoring gaps is also provided. The pipeline uses LAST as an aligner. If you run the last-train program directly or through the pipeline, you will also find an estimate of the percent similarity (ignoring gaps) in the trained parameter file that it outputs.

ADD COMMENT
0
Entering edit mode

Thank you. I prepared the samplesheet.csv file as

HoVa25_chr1, chr1_out.fa
Query_1, Valg_chr1_ref.fa
Query_2, Valg_V25_chr1_Cns.fa

opened the terminal in the folder containing all the files and ran nextflow run nf-core/pairgenomealign --target ./target_genome_file.fa --input ./samplesheet.csv --outdir ./results -profile dockerbut I got the error:

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --input (./samplesheet.csv): Validation of file failed:
    -> Entry 1: Missing required field(s): sample, fasta
    -> Entry 2: Missing required field(s): sample, fast

What is the correct syntax?

ADD REPLY
1
Entering edit mode

Looks like a file header is required for the csv file (as below). Try this file.

sample,fasta
HoVa25_chr1,chr1_out.fa
Query_1,Valg_chr1_ref.fa
Query_2,Valg_V25_chr1_Cns.fa
ADD REPLY
0
Entering edit mode

Thank you, I tried but this time I got:

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
The following invalid input values have been detected:

* --input (./samplesheet.csv): Validation of file failed:
    -> Entry 1: Error for field 'fasta' (chr1_out.fa): the file or directory 'chr1_out.fa' does not exist (Fasta file for genomes must be provided, cannot contain spaces and must have extension `.fa`, `.fa.gz`, `.fna`, `.fna.gz`, `.fasta` or `.fasta.gz`)

I then created the file chr1_out.fa in the working folder but then I got:

[-        ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[-        ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[-        ] NFC…AIRGENOMEALIGN:CUTN_TARGET -
[-        ] NFC…PAIRGENOMEALIGN:CUTN_QUERY -
[-        ] NFC…IRGENOMEALIGN:ASSEMBLYSCAN -
[-        ] NFC…IQC_ASSEMBLYSCAN_PLOT_DATA -
[-        ] NFC…ALIGN_M2O:ALIGNMENT_LASTDB -
[-        ] NFC…RALIGN_M2O:ALIGNMENT_TRAIN -
[-        ] NFC…N_M2O:ALIGNMENT_LASTAL_M2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_M2O -
[-        ] NFC…GN_M2O:ALIGNMENT_SPLIT_O2O -
[-        ] NFC…_M2O:ALIGNMENT_DOTPLOT_O2O -
[-        ] NFC…GN:PAIRGENOMEALIGN:MULTIQC -
ERROR ~ No such file or directory: /home/gigiux/Downloads/ALIGN/target_genome_file.fa

 -- Check script '/home/gigiux/.nextflow/assets/nf-core/pairgenomealign/main.nf' at line: 83 or see '.nextflow.log' file for more details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details
-[nf-core/pairgenomealign] Pipeline completed with errors-
ADD REPLY
0
Entering edit mode

ERROR ~ No such file or directory: /home/gigiux/Downloads/ALIGN/target_genome_file.fa

Looks like the error is now with a different file.

ADD REPLY
0
Entering edit mode

I created also target_genome_file.fa but now I got the error:

ERROR ~ Error executing process > 'NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB (target)'

Caused by:
  Process requirement exceeds available CPUs -- req: 6; avail: 4


Command executed:

  mkdir lastdb
  lastdb \
      -R01 -c -uYASS -S2 \
      -P 6 \
      lastdb/target \
      target_genome_file.fa

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PAIRGENOMEALIGN:PAIRGENOMEALIGN:PAIRALIGN_M2O:ALIGNMENT_LASTDB":
      last: $(lastdb --version 2>&1 | sed 's/lastdb //')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /home/gigiux/Downloads/ALIGN/work/07/20c49e14a447e696f1bbf5c31dd86f

Container:
  community.wave.seqera.io/library/last:1608--f41c047f7dc37e30

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details
ADD REPLY
1
Entering edit mode

Process requirement exceeds available CPUs -- req: 6; avail: 4

You will need to adjust hardware config

ADD REPLY
0
Entering edit mode

Yes, you can make a nf.conf local file and include it with -c nf.conf, taking example on the following profile from the default configs:

    gitpod {
        executor.name           = 'local'
        executor.cpus           = 4
        executor.memory         = 8.GB
        process {
            resourceLimits = [
                memory: 8.GB,
                cpus  : 4,
                time  : 1.h
            ]
        }
    }

https://github.com/nf-core/pairgenomealign/blob/2.0.0/nextflow.config#L172-L183

Or if this profile happens to fit your needs well, you can include with the -p option. You probably still need docker, so that would be -p docker,gitput. (not tested)

ADD REPLY

Login before adding your answer.

Traffic: 1781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6