Hello Biostars community,
I’m working on coassembling the genome of Desulfovibrio glucosivorans DMSS-1 using sequencing data from both PacBio RS and Illumina HiSeq 2000 platforms. All the reads are available under BioProject accession PRJNA186466.
I read that preprocessing of PacBio RS reads typically requires primary analysis data to use in SMRT Link. However, the Sequence Read Archive (SRA) for my project only provides FASTQ files, and I haven’t been able to figure out how to process PacBio RS FASTQ files for before coassembly.
Any guidance on how I should preprocess the FASTQ files before coassembly with Unicycler would be greatly appreciated! Thank you in advance for your help!
More than likely the fastq files have already been pre-processed. This is a submission from Joint Genome Institute so that is some additional assurance. You can go ahead and start on your assembly. This data is from 2013.
You could run
fastplong
(LINK) on the data to see if it is able to identify any recurring sequences but that is probably not needed.