We are trying to enhance a present draft genome of an species of Planaria. The size of the present draft genome is about 1.52Gbp. We used Arcs to enhance our draft genome, the result is almost good but only the scaffolds are augmented and the total size of the draft genome is not changed. It wasn't unexpected for us, because the Arcs uses 10X data in order to re-scaffold the genome and does not insert 10X reads in the genome string. It seems that we need another tool.
Does anybody know any tool getting a draft genome and 10X reads and make the total size of the genome longer, not only making bigger scaffolds?
thanks for any response
If you have adequate coverage of 10X reads, get an assembly from SuperNova, chances are that it'll be more contiguous anyways.
Otherwise, in order to use arcs here are the steps that I take:
AFAIK, scaffolding doesn't necessarily have to increase the genome's length. All it does is to see whether the contigs can be linked together or spanned using the dataset.
Ninja Edit: Quoting is not the same across websites.
Thank you for sharing these instructions, harish! I can't believe how long it's taken me to find even this level of instruction on how to use ARCS. I even contacted the authors and they basically told me to sort it out myself! I am very grateful to you.
Ahh, no worries :)
I know the pain of doing this time and time again :)
Besides that's half the reason I asked you to run scaff10x. Arcs doesn't really seem to behave well in my hands :(
I've just given Scaff10x a go, and got it running. It exited cleanly and produced all the expected output. It appears to have done no scaffolding whatsoever - the number of sequences in the output "scaffolded" assembly is identical to the number in the draft assembly. I'm probably doing something wrong... at least, I hope I am! I can't find any solid documentation for Scaff10x outside of the very limited Readme provided on GitHub. If anyone has any better documentation to hand, or a step-by-step tutorial, I'd be so grateful!
If you have the sam files on hand, try changing the block size and the edge length to approximately 2/3 to 4/3 of your molecule length. That seems to give the most improvement tbh.
What is your molecule length though? I generally take the values to be 50000 and 50000, which has tended to give good results. Also try changing the minimum links parameter.
I'll bang out a step by step by this weekend :)
Hi @harish
Sorry to single you out like this but you're the only person I've encountered here who seems likely to be able to help. I've had no luck with Scaff10x, and I've made a post asking for help. Posting it here firstly in the hope that you see it and have something to say, and secondly in case anyone else does too! I hope this type of cross-posting / post-linking is ok...
Help scaffolding an Illumina genome assembly with 10x reads
Alright, can you link what version of Scaff10X you are using? There has been a newer one since the past few days. The recent one seems to fix a couple of issues, but I haven't had the opportunity to try, so I can't really comment on it.
The following are the couple of steps that I generally follow:
Or skip 1-3 and align through longranger and give the bam directly.
But hey, theres a newer version out, so try that.
Hi harish
Thanks so much for replying. I'm using the newest version of Scaff10x.
I haven't tried reading the gzipped fastq files via zcat, but I've had no issues running the program "normally" with gzipped fastq files - the program runs and exits cleanly, but just doesn't do anything (n contigs in = n scaffolds out, N50 in = N50 out).
I've tried running the scaff_reads program but it produced a very wrong output (loads of reads went missing and the R1 and R2 files produced had different numbers of reads).
I've also tried running scaff10x on a bam file produced by LongRanger, and that segfaulted all over the place. I've put the details in the post I linked to in my last comment.
Have you seen a newer implementation of arcs? I think they might have fixed few memory things. Unfortunately, manual is still quite humble.
Thanks for the suggestion, I'll take a look. The documentation is my issue, though, it's so poor that I don't really understand what to do or even where to start.
The last release was 8 months ago...
I have just tried running it myself and everything is working nicely. Just be sure that: you got the right input (specified on arks' page); you have installed and made available in $PATH all the dependencies (LINKS (with bloom filter), tigmint (+ bedtools samtools bwa)), and lastly run ~/arks/Examples/arks-make to proceed with pipeline (try --help for all the possible options).
It seems you want to do gap closing, there are a number of software available, like Sealer, FGAP, GapCloser (part of SOAPdenovo), among others.
If you have the original sequencing runs + your newer 10x libraries, you should consider assembling de novo.
Hi Mostafa
(Apologies first for posting this here, I can't work out how to send you a private message directly).
I want to use ARCS to scaffold a draft genome. I have the program installed but its documentation is too minimal for me to get going. So far I have made an interleaved fastq file of my 10x reads using longranger basic. When I call the arcs help file, it makes no mention of an interleaved file, and instead says it expects to be pointed to an alignment file. Do I align my interleaved file to my draft assembly, then run ARCS? Or am I barking up the wrong tree?
Thanks in advance and apologies again for commenting with this question.
I'm sorry for the delay in my response. If it works for you please contact me on Shahhosseini.94@gmail.com in order to discuss your questions.
Yup, there are two options basically. Use longranger to align and get a bam.
Or the other option is to use bwa mem -p and sort the output using the read names. I think the relevant option is to use samtools sort -n