Hi,
I am trying a donovo assembly of a reptilian genome (size comparable to humans) with ALLPATHS-LG. I have two illumina libraries paired-end and mate-pairs. In addition to it, I have a pacbio library.
I used LoRDEC to correct the errors in the pacbio data. For this I utilized the short reads from illumina (to get the deBruijn graph). I also carried out the trim-split step given in LoRDEC. My question is do I use the corrected pacbio reads (as is) or do I use the corrected-trimmed-split pacbio reads as long reads in ALLPATHS-LG deno assembly. pipeline
I am asking this because according to the LoRDEC manual "The output is the set of corrected reads also in FASTA format. In these corrected sequences: uppercase symbol denote correct nucleotides, while lowercase denote nucleotides left un-corrected."
Also, I plan to improve upon the correction process by using the corrected pacbio reads (either as corrected or as corrected-trim-split fasta files) as the input for the succeeding step of error correction with an increment in k-mer value and repeat the same. Could anyone tell me if the above steps are meaningful or if they are wrong, suggest an alternative iterated correction protocol.
Thanks
I have the same question as you. What did you end up doing? Trim, split, both or nothing?