Hi,
I do work with interleaved fastq files, "/1" and "/2" for seqs of the corresponding read.
Assuming I have two files containing data for library "ABC" for PE data and two more files containing data for MP libs "XYZ".
I could:
(1) abyss-pe [..] lib='ABC' mp='XYZ' ABC='abc.A.fq abc.B.fq' XYZ='xyz.A.fq xyz.B.fq' [..]
or I could concatenate both (interleaved) ABC files and both (interleaved) XYZ files and do:
(2) abyss-pe [..] lib='ABC' mp='XYZ' ABC='abc.AB.fq' XYZ='xyz.AB.fq' [..]
Are those two approaches equivalent (in terms of processing and expected results)?
I am not really sure about that as my files are interleaved and I don't want A.fq and B.fq to be considered read pairs ...
I'd obviously prefer (1) as it is clean and simple. Assemblies are currently running, so I don't have any results from trial&error yet.
Thanks,
Sven
This is the correct answer. If you only pass one file per library, ABySS will treat the file as interleaved. As h.mon says, it is important to specify the libraries separately (i.e. ABC1 and ABC2) in order for ABySS to correctly estimate the fragment size distribution of each library. (The fragment size estimation is done by aligning the read pairs to the assembly contigs.)
Thanks. Interesting. I do have one library on two or more HiSeq lanes; so I need to merge them first (if I want to use interleaved format) before assembling? Providing two (interleaved) files for the same library makes ABySS think these are paired-end?
You do not need to merge them:
The syntax is really flexible, even if a bit confusing. And yes, I think ABYSS will interpret as paired reads if you provide two files for the same library.
ah, ok. It is now much clearer. So "lib" refers more or less to one fastq file (or pair), not necessarily to a real (sequencing) library. This is probably what has confused me :-)