Question

What Are You Using For A Reference Assembler?

1

Entering edit mode

12.5 years ago

diltsjeri ▴ 470

I need some information on reference assemblers. What are you using? Which is the most preferable reference assembler?

reference • 5.2k views

ADD COMMENT • link updated 12.5 years ago by Lee Katz ★ 3.2k • written 12.5 years ago by diltsjeri ▴ 470

1

Entering edit mode

What is a "reference assembler"? One that uses a reference genome, or one that you want to use as a reference for comparison with others?

ADD REPLY • link 12.5 years ago by Neilfws 49k

0

Entering edit mode

Also: similar question, same user: http://www.biostars.org/post/show/44956/ion-torrent-reference-assembly/. Best to avoid posting multiple, highly-similar questions.

ADD REPLY • link 12.5 years ago by Neilfws 49k

score 2 · Answer 1 · 2012-05-15

2

Entering edit mode

12.5 years ago

Lee Katz ★ 3.2k

AMOScmp-shortreads is working well for me, but it takes a bit longer.

ADD COMMENT • link 12.5 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

What files are you using to do the message file conversion with toAmos?

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

0

Entering edit mode

I have been using toAmos_new (in the new versions only) to convert the fastq to a bnk, and then I start on step 20 in amos using -s 20 so that I can trick it into starting on a bnk file instead of an afg file. It's buried in my script but I think it's something like

toAmos_new -Q run.fastq -t SANGER -b amos.bnk
AMOScmp-shortreads -s 20 amos

You'll need amos.1con and amos.bnk in the same directory for this to work. You can use "amos" or any other prefix, but it must be the same between files.

ADD REPLY • link 12.5 years ago by Lee Katz ★ 3.2k

1

Entering edit mode

What's the advantage to starting with a bnk file? Also, are the options you posted above new to toAmosnew? I'm starting a pipeline with ion torrent data, so all I have is an sff. I use sffextract, to get the fasta,qual, and xml and I was going to use toAmos to convert to afg, but should I not?

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

0

Entering edit mode

The only advantage is that toAmos_new can read in a fastq file and therefore you skip 1) converting to fasta/qual and then 2) converting to afg. Internally, AMOScmp converts first to a bnk anyway and doesn't use the afg anymore.

ADD REPLY • link 12.5 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

thanks! this is really helpful.

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

0

Entering edit mode

After validating it, AMOScmp unfortunately does not perform as well as I thought it should. I had a few more false-positives than when I worked with bowtie2. Sorry to do this, but I withdraw this recommendation in favor of newer tools. BWA came in as a close second to bowtie2 and was still better than AMOScmp.

edit I mean AMOScmp-shortReads.

ADD REPLY • link 12.5 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

Thanks for following up.

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

0

Entering edit mode

Bowtie doesn't output contigs though correct? I need my reads to be assembled.

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

1

Entering edit mode

You'll have to follow up with samtools and "vcfutils.pl vcf2fq"

I am writing a script to automate this step but it is not finalized yet.

ADD REPLY • link 12.5 years ago by Lee Katz ★ 3.2k

0

Entering edit mode

My reference sequence (to be indexed) has ambiguous nucleotides. This is apparently not supported by bowtie. Have you ran into this problem and if so how did you work it? I can't just replace those V, H,etc with a nucleotide because it would make the alignment bias.

I noticed bowtie offers a -ntoa option on bowtie-build, but that just changes all N's to As. Wouldn't that create a bias? Also I have other nucleotide variables like V, H as stated above, which the option --ntoa wouldn't fix.

And I have some gaps :(

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

0

Entering edit mode

replace all the ambiguous letters with N. You can do that with sed, or in a text editor like vim. And I'd use sed to get rid of - as well. Putting in an A will create a bias.

bwa will not crash on a genome like that. I'm pretty sure it will treat them all like N's.

ADD REPLY • link 12.5 years ago by swbarnes2 14k

0

Entering edit mode

Bowtie won't take Ns, I wish it did ;(

ADD REPLY • link 12.4 years ago by diltsjeri ▴ 470

score 1 · Answer 2 · 2012-05-15

1

Entering edit mode

12.5 years ago

Nikolay Vyahhi ★ 1.3k

Bowtie
BWA

ADD COMMENT • link 12.5 years ago by Nikolay Vyahhi ★ 1.3k

2

Entering edit mode

Those are aligners. Assemblers are like programs like vevlet.

ADD REPLY • link 12.5 years ago by swbarnes2 14k

0

Entering edit mode

Velvet is de novo assembler. If you need to assemble by reference, then you need aligner.

ADD REPLY • link 12.5 years ago by Nikolay Vyahhi ★ 1.3k

0

Entering edit mode

This seems to be an ongoing debate on this forum.

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

0

Entering edit mode

I need my reads to be assembled based on a reference. With tools like Bowtie and BWA I get the percentage aligned and I can see the aligned regions, but the reads are not being assembled based on the reference. I believe this is the difference between the two.

ADD REPLY • link 12.5 years ago by diltsjeri ▴ 470

1

Entering edit mode

After alignment, you can construct (assemble) consensus sequence from BAM/SAM-file using samtools: http://samtools.sourceforge.net/cns0.shtml

ADD REPLY • link 12.5 years ago by Nikolay Vyahhi ★ 1.3k

0

Entering edit mode

How can we have the snps and Indel replaced, and have the uncovered regions of the genome, represented as a series of Ns in the consensus assembly?