MaSuRCA, Flye assembly error
0
0
Entering edit mode
5.4 years ago
milady81 ▴ 70

Dear Biostars,

I need your help. I am running MaSuRCA for the Paired end reads with nanopore plus pacbio in one file for bacteria organism.

DATA

PE= il 75 11 /../../R1.fastq /../../R2.fastq (of course I am using a full path)

NANOPORE= /../../../both_longreads.fastq (of course I am using a full path)

END

Parameters in the config file:

PARAMETERS

EXTEND_JUMP_READS=0

GRAPH_KMER_SIZE = auto

USE_LINKING_MATES = 0

USE_GRID=0

GRID_ENGINE=SGE

GRID_QUEUE=all.q

GRID_BATCH_SIZE=500000000

LHE_COVERAGE=25

MEGA_READS_ONE_PASS=0

LIMIT_JUMP_COVERAGE = 60

CA_PARAMETERS = ovlMerSize=30 cgwErrorRate=0.25 ovlMemory=4GB

CLOSE_GAPS=1

NUM_THREADS = 10

JF_SIZE = 160000000

SOAP_ASSEMBLY=0

FLYE_ASSEMBLY=1

END

I am getting an error on the "Assembly with flye failed" step:

[2019-06-25 20:42:03] root: INFO: Starting Flye 2.4.1-release

[2019-06-25 20:42:03] root: DEBUG: Cmd: /bioappl/src/MaSuRCA/MaSuRCA-3.3.3/bin/../Flye/bin/flye -t 6 --nano-corr mr.41.15.15.0.02.1.fa -g 7566250 --kmer-size 21 -m 2500 -o flye -i 0

[2019-06-25 20:42:03] root: INFO: >>>STAGE: configure

[2019-06-25 20:42:03] root: INFO: Configuring run

[2019-06-25 20:42:04] root: ERROR: Invalid char while reading mr.41.15.15.0.02.1.fa

I have no idea what to do now? I would be very glad for any help, Dorota

Assembly genome assembly • 3.6k views
ADD COMMENT
1
Entering edit mode

ERROR: Invalid char while reading mr.41.15.15.0.02.1.fa

Looks like you need to check that file. Does it have anything other than ACTG in sequence?

ADD REPLY
0
Entering edit mode

Thank you Genomax:) You are totally right, however, I have no idea how it appears:

m54293_190222_151630/43319410/0_37246.33848_2905 ACGGAAGGCGGCCCAGCATCTCGCGGCTTTGCAGCAGTTCCAGCACGGTCTCGCGCCAGTGGTCGGCTCAGTTTGTCGATTCCGTTGAGCGTCATTCCGTCCAGGTTGGCGCGGATCTCGAACCGCATGCCGTCGCCGACCGGATAGGACTTCGGGAAGATGTAGCGGATGATGAATTCCCCGTCGTATTperl: warning: Setting locale failed.perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_CTYPE = "UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system.perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Each run at the server is showing me that

[Tue Jun 25 20:36:28 CEST 2019] Running locally in 1 batch

perl: warning: Setting locale failed.

perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_CTYPE = "UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system.

perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

it should be removed by the server administrator? Or? It is unbelievable that those Warnings appeared at the end of almost all sequences in the file mentioned above.

ADD REPLY
0
Entering edit mode

Have you opened/edited any of these files on Windows and then moved them to linux? Perhaps it may just be a matter of doing dos2unix your_file.fa to fix the line endings.

ADD REPLY
0
Entering edit mode

No, I am using the only Linux. Thx, I will do a line fixing. However, I do not know if after editing the file MaSuRCA will run from the moment it stopped? Now I know I really need to "fix" the warnings, that are influencing on my assembly:). Thank you Genomax again, a lot:) I would never think those warnings are inside the file and disrupt my data and analysis.

ADD REPLY
0
Entering edit mode

The problem is not fixed. The file:

mr.41.15.15.0.02.1.fa

does not contain any invalid char anymore, and still I am getting the same error:

ERROR: Invalid char while reading mr.41.15.15.0.02.1.fa

Maybe someone have idea what to do?

ADD REPLY
0
Entering edit mode
Please check that your locale settings: LANG = "en_US.UTF-8" are supported and installed on your system.

It seems that your OS does not support "en_US.UTF-8". Try to set it up with:

LANG=C perl -e exit

You can also try to reconfigure your locales with "dpkg-reconfigure locales"

(source: https://stackoverflow.com/questions/2499794/how-to-fix-a-locale-setting-warning-from-perl?page=1&tab=votes#tab-top)

ADD REPLY
0
Entering edit mode

Hi Corentin, actually the admin of the server already fixed the:

"Please check that your locale settings: LANG = "en_US.UTF-8" are supported and installed on your system"

After fixing the Perl issue, still nothing changed with the error from the Flye assembly. However, it changes with the mr.41.15.15.0.02.1.fa file where I had the Perl: warning before. Now the file contains only sequences.

I do not know what is wrong that I am still getting the

ERROR: Invalid char while reading mr.41.15.15.0.02.1.fa

ADD REPLY
0
Entering edit mode

Just to get this out of the way: are you using "~" instead of "/home/username/" in your full path ? Sometimes "~" is not correctly interpreted.

Also, try to read the file with "cat -v mr.41.15.15.0.02.1.fa", and check if any "^M" characters appear (these are windows new line).

ADD REPLY
0
Entering edit mode

As others has mentioned, you should check for anything other than ATCG in the sequence. For example, I noticed fasta output from canu-correct software has a "$" at the end of some sequences that will generate the error you mentioned.

ADD REPLY
0
Entering edit mode

Hi,

i had same issue,

in my case missing \n in TATTTTTAAGTATTTT[HERE]>contig_10.1_11288 on my file mr.41.15.17.0.029.1.fa

then, i fix replacing all > with \n> and remove blank lines:

cp mr.41.15.17.0.029.1.fa mr.41.15.17.0.029.1.fa.old

sed 's/>/\n>/' mr.41.15.17.0.029.1.fa.old | grep -vP "^$" > mr.41.15.17.0.029.1.fa

now ./assemble.sh

then

Running assembly with Flye ... _

works for me!!!

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6