Question

Masurca Error With Illumina Assembly

1

Entering edit mode

11.6 years ago

Raygozak ★ 1.4k

Hi i'm new to MaSuRCA and got this error while trying to do my first assembly, below is the config file and the output from MaSuRCA. thanks

Thanks a lot

processing PE library reads Wed May 29 16:53:40 EDT 2013

Average PE read length 251

choosing kmer size of 175 for the graph

running Jellyfish Wed May 29 16:54:06 EDT 2013

MIN_Q_CHAR: 33

Error correction Poisson cutoff = 5

error correct PE Wed May 29 17:24:40 EDT 2013

terminate called after throwing an instance of 'jellyfish::file_parser::FileParserError'

what(): Empty input file 'pe.cor.fa'

./assemble.sh: line 58: 610 Aborted jellyfish count -p 126 -m 31 -t 12 -C -s $JF_SIZE -o k_u pe.cor.fa

ln: creating symbolic link k_u_hash_0' to k_u_0': File exists

terminate called after throwing an instance of 'mapped_file::ErrorMMap'

what(): Can't open file k_u_hash_0:

Estimated genome size:

Invalid uint64_t '-l' for [-n, --nb-mers=uint64]: Negative value

computing super reads from PE Wed May 29 17:29:02 EDT 2013

Super reads failed, check super1.err and files in ./work1/

config.txt:

PATHS

JELLYFISH_PATH=/gpfs/home/jzr186/work/tools/bin/

SR_PATH=/gpfs/home/jzr186/work/tools/bin/

CA_PATH=/gpfs/home/jzr186/work/tools/CA/Linux-amd64/bin

END

DATA

PE= pe 300 20 /gpfs/home/jzr186/scratch/CAMP/CAMP18/JH_R1_001.fastq /gpfs/home/jzr186/scratch/CAMP/CAMP18/JH_R2_001.fastq

END

PARAMETERS

GRAPH_KMER_SIZE=auto

USE_LINKING_MATES=1

JF_SIZE=1800000000

DO_HOMOPOLYMER_TRIM=0

NUM_THREADS=12

END

denovo illumina • 8.3k views

ADD COMMENT • link updated 10.8 years ago by sutturka ▴ 190 • written 11.6 years ago by Raygozak ★ 1.4k

score 1 · Answer 1 · 2013-05-30

1

Entering edit mode

11.6 years ago

rtliu ★ 2.2k

I would suggest you start your first MaSuRCA run with the test data from MaSuRCA ftp site

ftp://ftp.genome.umd.edu/pub/MaSuRCA/test_data/rhodobacter/

PE data only, then add SJ, Sanger data.

Then double-check your input data. e.g. FastqPairedEndValidator.pl

Good luck!

Update 27-07-2013

MaSuRCA finally released the config file for test data rhodobacter

ftp://ftp.genome.umd.edu/pub/MaSuRCA/test_data/rhodobacter/sr_config_Illumina_Sanger_1x.txt

ADD COMMENT • link 11.4 years ago by rtliu ★ 2.2k

0

Entering edit mode

I have a follow up question on this: how do I add libraries progressively? you mentioned here that you can add PE data only for the first round and then add SJ data.. how do I do this exactly? do mean multiple rounds or am I missing out something obvious?

ADD REPLY • link 10.4 years ago by arnstrm ★ 1.9k

score 1 · Answer 2 · 2014-03-19

Hi,

I contacted developers regarding this and they suggested that read_names does not matter during pre-processing of data. He suggested me to perform a test with my fastq file:

> file -b -i jumps.A.fastq

This gave me the results like:

text/x-python; charset=us-ascii

I emailed results to developers and they suggested that - the operating system thinks that your fastq file is a python code. This is not correct. The type should be text/plain.

The simple way to fix this:

Look at expand_fastq script under masurca bin folder and replace the line:

    (text/plain*)
with
    (text/*)

everything should work afterward.

After this change, I was able to run the assembler correctly with setting JF_SIZE=1800000000 value very high.

Thanks Sagar

score 0 · Answer 3 · 2013-07-24

0

Entering edit mode

11.4 years ago

jc.szamosi ▴ 50

I've been having the same problem. The test data doesn't help. My data is PE only, the paired ends have been checked, and the error happens for some genomes but not others, with no apparent pattern of read length, GC content, or anything else I can think of.

ADD COMMENT • link 11.4 years ago by jc.szamosi ▴ 50

score 0 · Answer 4 · 2013-09-30

0

Entering edit mode

11.2 years ago

luisnevescunha • 0

Me too, and it is so annoying!!! I thought it was a memory (RAM) problem but then I tried to re run with some libraries that worked well in past and same error.

Anyone with a solution?

Luis

ADD COMMENT • link 11.2 years ago by luisnevescunha • 0