This question is because of I am not getting this information in the tool manual as well as any tutorial for help. Could I run canu assembler with input of multiple files of pacbio reads with step by step Correct, Trim and Assemble, Manually.
CORRECTION step:-
canu -correct -p plant -d run1 genomeSize=900m -pacbio-raw 1.fasta 2.fasta 3.fasta 4.fasta 5.fasta 6.fasta
output:
plant.correctedReads.fasta.gz
My Ques is:
Here, in this step assembler had generated only single corrected file for all 6 files. Is there any loss of data. 'OR' i have to do this step individually for all 6 files. like,
canu -correct -p readfile1 -d run1 genomeSize=900m -pacbio-raw 1.fasta
canu -correct -p readfile2 -d run1 genomeSize=900m -pacbio-raw 2.fasta
canu -correct -p readfile3 -d run1 genomeSize=900m -pacbio-raw 3.fasta
canu -correct -p readfile4 -d run1 genomeSize=900m -pacbio-raw 4.fasta
canu -correct -p readfile5 -d run1 genomeSize=900m -pacbio-raw 5.fasta
canu -correct -p readfile6 -d run1 genomeSize=900m -pacbio-raw 6.fasta
TRIMMING step:-
canu -trim -p plant -d run1 genomeSize=900m -pacbio-corrected plant.correctedReads.fasta.gz
output:
It is at running stage
'OR' i have to do this step like,
canu -trim -p readfile1 -d run1 genomeSize=900m -pacbio-corrected 1.correctedReads.fasta.gz
canu -trim -p readfile2 -d run1 genomeSize=900m -pacbio-corrected 2.correctedReads.fasta.gz
canu -trim -p readfile3 -d run1 genomeSize=900m -pacbio-corrected 3.correctedReads.fasta.gz
canu -trim -p readfile4 -d run1 genomeSize=900m -pacbio-corrected 4.correctedReads.fasta.gz
canu -trim -p readfile5 -d run1 genomeSize=900m -pacbio-corrected 5.correctedReads.fasta.gz
canu -trim -p readfile6 -d run1 genomeSize=900m -pacbio-corrected 6.correctedReads.fasta.gz
ASSEMBLY step:-
canu -assemble -p plant -d run1 genomeSize=900m correctedErrorRate=0.039 -pacbio-corrected plant.trimmedReads.fasta.gz
'OR'
canu -assemble -p plant -d run1 genomeSize=900m correctedErrorRate=0.039 -pacbio-corrected 1.trimmedReads.fasta.gz 2.trimmedReads.fasta.gz 3.trimmedReads.fasta.gz 4.trimmedReads.fasta.gz 5.trimmedReads.fasta.gz 6.trimmedReads.fasta.gz
My Ques is:
1) Is both (single run and individual run for correction and trimming) pipeline will generate same assembly or different.
2) Is there any loss of data with first pipeline (single run for correction and trimming).
Please assist, this would be appreciable.
I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Thanks, WouterDeCoster.
Many many thanks, Lieven for your valuable suggestion.
I agree with your points, but at the step of trimming, there I have only one file, resulted from correction step as a correctedReads.gz file. So, here in my case trimming part have the only option to run at once for that single file, no option for step-by-step.
One-by-one Trimming of reads will only possible when we do the correction step one-by-one for all individual read files.
Now, I have to see my results, with both of my steps. Let's us see what will be the differences in the result, I will share here.
Thanks again.
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
True, but you can easily split up the correctedReads file into a number of chunks, but anyway, Canu will do this itself normally (see my answer below)