ERROR_ non-ACGTN characters in genome
0
0
Entering edit mode
2.6 years ago
Neel ▴ 20

Hi, I am trying to run quast and i encountered this error, please help me.

**quast /home/bvs/neelam/complete_genome/complete.fna -o ./outdir1

Version: 5.0.2

System information: OS: Linux-5.8.0-59-generic-x86_64-with-debian-bullseye-sid (linux_64) Python version: 3.7.12 CPUs number: 96

Started: 2022-05-02 12:30:13

Logging to /home/bvs/neelam/complete_genome/outdir1/quast.log NOTICE: Maximum number of threads is set to 24 (use --threads option to set it manually)

CWD: /home/bvs/neelam/complete_genome Main parameters: MODE: default, threads: 24, minimum contig length: 500, minimum alignment length: 65, \ ambiguity: one, threshold for extensive misassembly size: 1000

Contigs: Pre-processing...

**ERROR! Skipping /home/bvs/neelam/complete_genome/complete.fna because it contains non-ACGTN characters.**

Thank you!

quast • 1.1k views
ADD COMMENT
0
Entering edit mode

My genome.fna file look like this

CP040127.1 GCTTGCCGAACTGCTGGTGCAGCTCCGGATTTTCCAGTTCGCCCAGGCCTTTGCGCGCGA TCACCACGATATCCCAGCCAGCCAGGGTTTCCTGGTTATGGCGGAACGATTCGCGGATCA GGCGTTTGAGGCGATTGCGCTGGACGGCGAGCTTGACGTTCTTCTTGCCGATCACCAGGC CCAGGCGGGGGTGATCGAGACCGTTCTCGCGCGCCAGCAGCAGGACGTGCTTGCCGGGGA CCTTGCCGGTCGGAGAGTCGAAGACTGCGCTGAATTGCCGGGCTGTCAGTAGACGCTTGT CCCGGTCGAAGTCCCGACTCACCACCCGTACCGGATAAATCAGACGGTCAGACGCTTACG GCCTTTGGCGCGACGACGCGACAGAACCTGACGGCCGTTCTTGGTGGCCATACGGGCGCG GAAACCGTGGACGCGAGCGCGCTTGAGGGTGCTGGGTTGGAAAGTACGTTTCATGATTCG GTACCTGGGTTGACGACTTGAGGTCGCAGTGACCCCGTTTAAAGAGACCGGCGATTCTAG TGAAATCGAACGGGCAGGTCAATTTCCAACCAGCGATGACGTAATAGATAGATACAAGGA AGTCATTTTTCTTTTAAAGGATAGAAACGGTTAATGCTCTTGGGACGGCGCTTTTCTGTG CATAACTCGATGAAGCCCAGCAATTGCGTGTTTCTCCGGCAGGCAAAAGGTTGTCGAGAA CCGGTGTCGAGGCTGTTTCCTTCCTGAGCGAAGCCTGGGGATGAACGAGATGGTTATCCA CAGCGGTTTTTTCCACACGGCTGTGCGCAGGGATGTACCCCCTTCAAAGCAAGGGTTATC CACAAAGTCCAGGACGACCGTCCGTCGGCCTGCCTGCTTTTATTAAGGTCTTGATTTGCT TGGGGCCTCAGCGCATCGGCATGTGGATAAGTCCGGCCCGTCCGGCTACAATAGGCGCTT ATTTCGTTGTGCCGCCTTTCCAATCTTTGGGGGATATCCGTGTCCGTGGAACTTTGGCAG CAGTGCGTGGATCTTCTCCGCGATGAGCTGCCGTCCCAACAATTCAACACCTGGATCCGT CCCTTGCAGGTCGAAGCCGAAGGCGACGAATTGCGTGTGTATGCACCCAACCGTTTCGTC CTCGATTGGGTGAACGAGAAATACCTCGGTCGGCTTCTGGAACTGCTCGGTGAACGCGGC GAGGGTCAGTTGCCCGCGCTTTCCTTATTAATAGGCAGCAAGCGTAGCCGTACGCCGCGC GCCGCCATTGTCCCATCGCAGACCCACGTGGCTCCCCCGCCTCCGGTTGCTCCGCCGCCG GCGCCAGTGCAGCCGGTATCGGCCGCGCCCGTGGTAGTGCCACGTGAAGAGCTGCCGCCA GTGACGACGGCTCCCAGCGTTTCGAGCGATCCCTACGAGCCGGAAGAACCCAGCATCGAT CCGCTGGCCGCCGCCATGCCGGCCGGAGCAGCGCCTGCGGTGCGCACCGAGCGCAACGTC CAGGTCGAAGGTGCGCTGAAGCACACCAGCTATCTCAACCGTACCTTCACCTTCGAGAAC TTCGTCGAGGGCAAGTCCAACCAGTTGGCCCGCGCCGCCGCCTGGCAGGTGGCGGACAAC CTCAAGCACGGCTACAACCCGCTGTTCCTCTACGGTGGCGTCGGCCTGGGCAAGACCCAC CTGATGCATGCGGTGGGCAACCACCTGCTGAAGAAGAACCCGAACGCCAAGGTGGTCTAC CTGCATTCGGAACGTTTCGTCGCGGACATGGTGAAGGCCTTGCAGCTCAACGCCATCAAC GAATTCAAGCGCTTCTACCGCTCGGTGGACGCACTGTTGATCGACGACATCCAGTTCTTC GCCCGTAAGGAGCGCTCCCAGGAGGAGTTCTTCCACACCTTCAATGCCCTTCTCGAAGGC GGCCAGCAGGTGATCCTCACCAGCGACCGCTATCCGAAGGAAATCGAAGGCCTGGAAGAG CGGCTGAAATCCCGCTTCGGCTGGGGCCTGACGGTGGCCGTCGAGCCGCCGGAACTGGAA ACCCGGGTGGCGATCCTGATGAAGAAGGCCGAGCAGGCGAAGATCGAGCTGCCGCACGAT GCGGCCTTCTTCATCGCCCAGCGCATCCGTTCCAACGTGCGTGAACTGGAAGGTGCGCTG AAGCGGGTGATCGCCCACTCGCACTTCATGGGCCGGCCGATCACCATCGAGCTGATTCGC GAGTCGCTGAAGGACCTGTTGGCCCTTCAGGACAAGCTGGTCAGCATCGACAACATCCAG CGCACCGTCGCCGAGTACTACAAGATCAAGATATCCGATCTGTTGTCCAAGCGGCGTTCG CGCTCGGTGGCGCGCCCGCGCCAGGTGGCCATGGCGCTCTCCAAGGAGCTGACCAACCAC AGCCTGCCGGAGATCGGCGTGGCCTTCGGCGGTCGGGATCACACCACGGTGTTGCACGCC TGTCGTAAGATCGCTCAACTTAGGGAATCCGACGCGGATATCCGCGAGGACTACAAGAAC CTGCTGCGTACCCTGACAACCTGACGCAGCCCACGAGGCAAGGGACTAGACCATGCATTT CACCATTCAACGCGAAGCCCTGTTGAAACCGCTGCAACTGGTCGCCGGCGTCGTGGAACG CCGCCAGACATTGCCGGTTCTCTCCAACGTCCTGCTGGTGGTCGAAGGCCAGCAACTGTC GCTGACCGGCACCGACCTCGAAGTCGAGCTGGTTGGTCGCGTGGTACTGGAAGATGTCGC CGAACCCGGCGAGATCACCGTACCGGCGCGCAAGCTGATGGACATCTGCAAGAGCCTGCC GAACGACGTGCTGATCGACATCCGTGTCGAAGAGCAGAAACTCCTGGTGAAGGCCGGGCG TAGCCGCTTCACCCTGTCCACCCTGCCGGCCAACGATTTCCCCACCGTGGAGGAAGGTCC CGGCTCGCTGAACTTCAGCATTGCCCAGAGCAAGCTGCGTCGCCTGATCGACCGCACCAG CTTCGCCATGGCCCAGCAGGACGTGCGTTACTACCTCAACGGCATGCTGCTGGAAGTGAA--

if anyone knowhow to remove -- from end of genome file please let me know.

ADD REPLY
0
Entering edit mode

I am pretty sure your problem is just the malformed header. Change it to

>CP040127.1
GCTTGCCGAACTGCTGGTGCAGCTCCGGATTTTCCAGTTCGCCCAGGCCTTTGCGCGCGA 

and try again. In case there are really other issues with the file, you can use tr, sed or reformat.sh from the BBTools suite:

reformat.sh in=/home/bvs/neelam/complete_genome/complete.fna out=/home/bvs/neelam/complete_genome/complete_fixed.fna fixjunk=t iupacToN=t dotdashxton=t  fixheaders=t
ADD REPLY
0
Entering edit mode

what is the output of

grep -v '^>' /home/bvs/neelam/complete_genome/complete.fna | tr -d 'ATGCN\n'

and

file /home/bvs/neelam/complete_genome/complete.fna

?

ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6