Question

DiscovarDenovo - fastq files should be interlaced

0

Entering edit mode

7.2 years ago

Kenny ▴ 30

Hi all,

I have an illumina TrueSeq long read fastq file and I would like to run DiscovarDenovo to perform genome assembly.

Here's my code:

DiscovarDeNovo READS=TSSLR-BDA11_LongRead.fastq.gz OUT_DIR=./TSSLR_DISCOVAR

But I got the message, saying:

The file 
TSSLR-BDA11_LongRead.fastq.gz
should be interlaced and hence have an even number of entries.  It does not.

Wondering why?

Kenny

Assembly discovar fastq interlaced • 2.8k views

ADD COMMENT • link updated 7.2 years ago by maxwhjohn1988 ▴ 130 • written 7.2 years ago by Kenny ▴ 30

1

Entering edit mode

7.2 years ago

colindaven 7.0k

You need to interleave your R1 and R2 files.

Try bbmap for this.

 reformat.sh | grep inter
 Description:  Reformats reads to change ASCII quality encoding, interleaving, file format, or compression format.
 **If input is paired and there is only one output file, it will be written interleaved.**
 int=f                   (interleaved) Determines whether INPUT file is considered interleaved.
 verifyinterleaved=f     (vint) sets 'vpair' to true and 'interleaved' to true.
 addslash=f              Append ' /1' and ' /2' to read names, if not already present.  Please include the flag 'int=t' if the reads are interleaved.
 addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.  Please include the flag 'int=t' if the reads are interleaved.
 ihist=<file>            Insert size histograms.  Requires paired reads interleaved in sam file.

ADD COMMENT • link 7.2 years ago by colindaven 7.0k

score 3 · Accepted Answer · 2017-09-28

Sounds like you have an odd number of reads in your input fastq file. Discovar De Novo is designed to work with paired-end data (with a specific read length) - you should have an even number of reads if you've got paired-end reads. I'm not too familiar with TruSeq but the file name with "LongRead" in it sounds like it might be an indication that Discovar De Novo isn't the best tool for this assembly.

Did you start with paired-end reads, and trim them in some way? That could possibly have resulted in removal of some mates, which would leave you with an odd number of reads (it shouldn't happen but some trimming algorithms do this unless you tell them not to). I often hear that Discovar De Novo works best with totally un-trimmed reads - adapter sequences, low-quality bases, short reads, a lot of people have told me to leave them in, so if you have trimmed your reads in some way, maybe try doing an assembly with the raw reads and see if you get the same error.

Maybe irrelevant if you only have the one read file, but just because I think it's a cool tool - SeqTK is also able to do interleaving of fastq files. https://github.com/lh3/seqtk . I'm not advocating its use instead of the BBMap suite - BBMap is awesome and fantastic and I rate it very highly indeed, just throwing some love to SeqTK ;)