Dear Biostars,
I need your help. I am running MaSuRCA for the Paired end reads with nanopore plus pacbio in one file for bacteria organism.
DATA
PE= il 75 11 /../../R1.fastq /../../R2.fastq (of course I am using a full path)
NANOPORE= /../../../both_longreads.fastq (of course I am using a full path)
END
Parameters in the config file:
PARAMETERS
EXTEND_JUMP_READS=0
GRAPH_KMER_SIZE = auto
USE_LINKING_MATES = 0
USE_GRID=0
GRID_ENGINE=SGE
GRID_QUEUE=all.q
GRID_BATCH_SIZE=500000000
LHE_COVERAGE=25
MEGA_READS_ONE_PASS=0
LIMIT_JUMP_COVERAGE = 60
CA_PARAMETERS = ovlMerSize=30 cgwErrorRate=0.25 ovlMemory=4GB
CLOSE_GAPS=1
NUM_THREADS = 10
JF_SIZE = 160000000
SOAP_ASSEMBLY=0
FLYE_ASSEMBLY=1
END
I am getting an error on the "Assembly with flye failed" step:
[2019-06-25 20:42:03] root: INFO: Starting Flye 2.4.1-release
[2019-06-25 20:42:03] root: DEBUG: Cmd: /bioappl/src/MaSuRCA/MaSuRCA-3.3.3/bin/../Flye/bin/flye -t 6 --nano-corr mr.41.15.15.0.02.1.fa -g 7566250 --kmer-size 21 -m 2500 -o flye -i 0
[2019-06-25 20:42:03] root: INFO: >>>STAGE: configure
[2019-06-25 20:42:03] root: INFO: Configuring run
[2019-06-25 20:42:04] root: ERROR: Invalid char while reading mr.41.15.15.0.02.1.fa
I have no idea what to do now? I would be very glad for any help, Dorota
Looks like you need to check that file. Does it have anything other than ACTG in sequence?
Thank you Genomax:) You are totally right, however, I have no idea how it appears:
Each run at the server is showing me that
it should be removed by the server administrator? Or? It is unbelievable that those Warnings appeared at the end of almost all sequences in the file mentioned above.
Have you opened/edited any of these files on Windows and then moved them to linux? Perhaps it may just be a matter of doing
dos2unix your_file.fa
to fix the line endings.No, I am using the only Linux. Thx, I will do a line fixing. However, I do not know if after editing the file MaSuRCA will run from the moment it stopped? Now I know I really need to "fix" the warnings, that are influencing on my assembly:). Thank you Genomax again, a lot:) I would never think those warnings are inside the file and disrupt my data and analysis.
The problem is not fixed. The file:
does not contain any invalid char anymore, and still I am getting the same error:
Maybe someone have idea what to do?
It seems that your OS does not support "en_US.UTF-8". Try to set it up with:
You can also try to reconfigure your locales with "dpkg-reconfigure locales"
(source: https://stackoverflow.com/questions/2499794/how-to-fix-a-locale-setting-warning-from-perl?page=1&tab=votes#tab-top)
Hi Corentin, actually the admin of the server already fixed the:
"Please check that your locale settings: LANG = "en_US.UTF-8" are supported and installed on your system"
After fixing the Perl issue, still nothing changed with the error from the Flye assembly. However, it changes with the mr.41.15.15.0.02.1.fa file where I had the Perl: warning before. Now the file contains only sequences.
I do not know what is wrong that I am still getting the
Just to get this out of the way: are you using "~" instead of "/home/username/" in your full path ? Sometimes "~" is not correctly interpreted.
Also, try to read the file with "cat -v mr.41.15.15.0.02.1.fa", and check if any "^M" characters appear (these are windows new line).
As others has mentioned, you should check for anything other than ATCG in the sequence. For example, I noticed fasta output from canu-correct software has a "$" at the end of some sequences that will generate the error you mentioned.
Hi,
i had same issue,
in my case missing \n in TATTTTTAAGTATTTT[HERE]>contig_10.1_11288 on my file mr.41.15.17.0.029.1.fa
then, i fix replacing all > with \n> and remove blank lines:
now ./assemble.sh
then
Running assembly with Flye ... _
works for me!!!