Enhancing draft genome using 10X data
0
1
Entering edit mode
6.3 years ago
Mostafa ▴ 20

We are trying to enhance a present draft genome of an species of Planaria. The size of the present draft genome is about 1.52Gbp. We used Arcs to enhance our draft genome, the result is almost good but only the scaffolds are augmented and the total size of the draft genome is not changed. It wasn't unexpected for us, because the Arcs uses 10X data in order to re-scaffold the genome and does not insert 10X reads in the genome string. It seems that we need another tool.

Does anybody know any tool getting a draft genome and 10X reads and make the total size of the genome longer, not only making bigger scaffolds?

thanks for any response

10X draft_genome Assembly • 3.4k views
ADD COMMENT
1
Entering edit mode

Does anybody know any tool getting a draft genome and 10X reads and make the total size of the genome longer, not only making bigger scaffolds?

If you have adequate coverage of 10X reads, get an assembly from SuperNova, chances are that it'll be more contiguous anyways.

Otherwise, in order to use arcs here are the steps that I take:

  1. Interleave the fastq
  2. Append the barcodes to the read header (with an underscore between the original header and barcode)
  3. Use bwa to map the interleaved fastq and obtain a bam
  4. Use samtools to sort the bam in lexicographical order (most important step)
  5. Check the arcs-master/Examples/pipeline_example.sh to see the requisite inputs or run Scaff10x
  6. Perform 3-5 iteratively.

AFAIK, scaffolding doesn't necessarily have to increase the genome's length. All it does is to see whether the contigs can be linked together or spanned using the dataset.

  1. If yes, then try to span it unambiguously
  2. Else, if the span can be estimated dump "N"s till unambiguous bases can be filled. These N's can represent any number of unspanned bases. Check the scaffolder manual to be sure.

Ninja Edit: Quoting is not the same across websites.

ADD REPLY
0
Entering edit mode

Thank you for sharing these instructions, harish! I can't believe how long it's taken me to find even this level of instruction on how to use ARCS. I even contacted the authors and they basically told me to sort it out myself! I am very grateful to you.

ADD REPLY
0
Entering edit mode

Ahh, no worries :)

I know the pain of doing this time and time again :)

Besides that's half the reason I asked you to run scaff10x. Arcs doesn't really seem to behave well in my hands :(

ADD REPLY
0
Entering edit mode

I've just given Scaff10x a go, and got it running. It exited cleanly and produced all the expected output. It appears to have done no scaffolding whatsoever - the number of sequences in the output "scaffolded" assembly is identical to the number in the draft assembly. I'm probably doing something wrong... at least, I hope I am! I can't find any solid documentation for Scaff10x outside of the very limited Readme provided on GitHub. If anyone has any better documentation to hand, or a step-by-step tutorial, I'd be so grateful!

ADD REPLY
0
Entering edit mode

If you have the sam files on hand, try changing the block size and the edge length to approximately 2/3 to 4/3 of your molecule length. That seems to give the most improvement tbh.

What is your molecule length though? I generally take the values to be 50000 and 50000, which has tended to give good results. Also try changing the minimum links parameter.

I'll bang out a step by step by this weekend :)

ADD REPLY
0
Entering edit mode

Hi @harish

Sorry to single you out like this but you're the only person I've encountered here who seems likely to be able to help. I've had no luck with Scaff10x, and I've made a post asking for help. Posting it here firstly in the hope that you see it and have something to say, and secondly in case anyone else does too! I hope this type of cross-posting / post-linking is ok...

Help scaffolding an Illumina genome assembly with 10x reads

ADD REPLY
0
Entering edit mode

Alright, can you link what version of Scaff10X you are using? There has been a newer one since the past few days. The recent one seems to fix a couple of issues, but I haven't had the opportunity to try, so I can't really comment on it.

The following are the couple of steps that I generally follow:

  1. Extract the gzipped R1 and R2 into individual pools (i.e. zcat R1.fq.gz > R1.fq or zcat R2.fq). This is probably going to be most important and where the segfaulting might occur at times if you use scaff10x to do it.
  2. Debarcode using scaff_BC-reads-1 for R1.fq, use the other file (not fastq) and remove barcodes from the R2.fq
  3. Map the debarcoded reads using BWA (I prefer this over smalt), get a sam, I don't honestly remember though, if it throws an error, sort it the other way through.

Or skip 1-3 and align through longranger and give the bam directly.

  1. Use scaff10x now and honestly hope that you won't be getting a segfault. I had the same, and the devs didn't respond to the mail

But hey, theres a newer version out, so try that.

ADD REPLY
0
Entering edit mode

Hi harish

Thanks so much for replying. I'm using the newest version of Scaff10x.

I haven't tried reading the gzipped fastq files via zcat, but I've had no issues running the program "normally" with gzipped fastq files - the program runs and exits cleanly, but just doesn't do anything (n contigs in = n scaffolds out, N50 in = N50 out).

I've tried running the scaff_reads program but it produced a very wrong output (loads of reads went missing and the R1 and R2 files produced had different numbers of reads).

I've also tried running scaff10x on a bam file produced by LongRanger, and that segfaulted all over the place. I've put the details in the post I linked to in my last comment.

ADD REPLY
0
Entering edit mode

Have you seen a newer implementation of arcs? I think they might have fixed few memory things. Unfortunately, manual is still quite humble.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, I'll take a look. The documentation is my issue, though, it's so poor that I don't really understand what to do or even where to start.

ADD REPLY
0
Entering edit mode

The last release was 8 months ago...

ADD REPLY
0
Entering edit mode

I have just tried running it myself and everything is working nicely. Just be sure that: you got the right input (specified on arks' page); you have installed and made available in $PATH all the dependencies (LINKS (with bloom filter), tigmint (+ bedtools samtools bwa)), and lastly run ~/arks/Examples/arks-make to proceed with pipeline (try --help for all the possible options).

ADD REPLY
0
Entering edit mode

It seems you want to do gap closing, there are a number of software available, like Sealer, FGAP, GapCloser (part of SOAPdenovo), among others.

If you have the original sequencing runs + your newer 10x libraries, you should consider assembling de novo.

ADD REPLY
0
Entering edit mode

Hi Mostafa

(Apologies first for posting this here, I can't work out how to send you a private message directly).

I want to use ARCS to scaffold a draft genome. I have the program installed but its documentation is too minimal for me to get going. So far I have made an interleaved fastq file of my 10x reads using longranger basic. When I call the arcs help file, it makes no mention of an interleaved file, and instead says it expects to be pointed to an alignment file. Do I align my interleaved file to my draft assembly, then run ARCS? Or am I barking up the wrong tree?

Thanks in advance and apologies again for commenting with this question.

ADD REPLY
0
Entering edit mode

I'm sorry for the delay in my response. If it works for you please contact me on Shahhosseini.94@gmail.com in order to discuss your questions.

ADD REPLY
0
Entering edit mode

Yup, there are two options basically. Use longranger to align and get a bam.

Or the other option is to use bwa mem -p and sort the output using the read names. I think the relevant option is to use samtools sort -n

ADD REPLY

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6