Virus Sequencing de novo vs reference based assembly
3
1
Entering edit mode
10.6 years ago
Anna S ▴ 520

Hello,

I will soon be receiving Epstein-Barr virus sequence data, with > 800x coverage. There's a nice paper that compares de novo assembly tools (Zhang W, Chen J, Yang Y, Tang Y, Shang J, et al. (2011) A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. PLoS ONE 6(3): e17915. doi:10.1371/journal.pone.0017915). Based on this paper, I'm thinking of using Edena as a de novo assembly tool, although other tools seem to be good as well (see figure 6 of paper).

My question is, I would like to compare the de novo assembly with a reference based one. Which reference based aligner do you recommend for a virus?

Thank you,
Anna

de-novo ngs virus • 5.9k views
ADD COMMENT
2
Entering edit mode
10.6 years ago

We are developing a galaxy server that facilitates this process,

http://docs.viramp.com/en/latest/viramp_intro.html#step-by-step-process

The server is here: http://viramp.com/

The paper and software are still under development and as such may have issues of various kinds nonetheless it might already have useful information for you.

Here is a draft

ADD COMMENT
0
Entering edit mode

Thank you, Istvan, you're the best! Do you think VIRAMP is ready enough for me to use it or should I follow the recommended steps on my own? Thanks.

ADD REPLY
1
Entering edit mode

the only caveat is that the server that we run there may not have enough storage, it does have a few 100 GBs

but we want people to try it. Ideally all you need to upload are the fastq and reference genome and it all takes it from there.

let's see what happens, also send feedback/questions to Yinan Wan (yzw128@psu.edu) she is in charge of making this work

ADD REPLY
0
Entering edit mode

Istvan, I tried your tool (it's great!!!). The default pipeline ran without any problems!! However, any variation from the default failed for me, whether it was changing the de novo assembler from velvet, or whether it was running some of the de novo tools outside of the pipeline. Does Yinan have viewing privileges to my session? That would make it much easier for her to fix bugs if there are any since everything I did this morning is there including the tasks that failed. In addition, QUAST promised more detail in the download version and I was expecting to see the nice graphs from your example, but they were missing in the download version too.

ADD REPLY
1
Entering edit mode

Hi Anna,

Thanks for trying out the viramp. Could you make an account on the viramp and share the history with me? You can refer to this link to learn how to share history (my account is just the email address listed above). I cannot identify userless datasets, and more importantly since this is a demonstrated platform with limited space, userless datasets are purged within certain time, but datasets associated with one account will be kept. If you have problem sharing the history, please at least make an account and run everything under that so I will try to identify the datasets from the database.

The QUAST bug has been fixed, thanks for pointing out. And you can email me any specific questions/issues you encountered during processing.

Best,
Yinan

ADD REPLY
0
Entering edit mode

Thank you, Yinan, for fixing QUAST !!

I have shared my history with you. As you predicted my work from this morning was wiped out but I created an account as you suggested and I recreated the most important problem for me, namely, that I cannot run the paired-end pipeline using VICUNA instead of velvet. As you can see from my history, I have tried using the default kmers, the highest default kmer only (65), the lowest default kmer only (35), a kmer of 20, and none worked with VICUNA.

You need to scroll down in the history as I have found that using velvet with k=20 works much better for my data than using the default pipeline and so all those successful velvet jobs are at the top so that I can move my project forward.

Thanks a lot!

ADD REPLY
0
Entering edit mode

Can you please send her an email and let's see if you can work this out.

ADD REPLY
1
Entering edit mode
10.6 years ago

In my opinion, I think a direct alignment is preferable to de novo assembly, when possible. Since the EBV genome has been sequenced, I would at least give that a try for an initial analysis. I think the aligner depends more on the data than the species, but I would typically use BWA for DNA-Seq analysis (although Bowtie, etc. is probably also OK).

I think it might be worth trying out this pipeline:

http://genomics-pubs.princeton.edu/prv/scripts.shtml

One caveat is that you'll want to subset your data. I've worked with similar coverage data before, and I needed to extract 1/3 of the reads to get SSAKE to work (which should still be plenty).

Actually, Moriah (the person who developed that pipeline) is at Penn State now. Not sure if this will eventually be incorporated into VIRAMP - I noticed her name on the manuscript draft.

ADD COMMENT
2
Entering edit mode

yes it is an improvement over those methods, we are benchmarking against those and do a better assembly with about 100x less resources

ADD REPLY
0
Entering edit mode

The pipeline you recommend looks great! Thanks a lot!

ADD REPLY
0
Entering edit mode
10.6 years ago
Anna S ▴ 520

I have just run VICUNA outside of VIRAMP. It's FAST with the default kmer size (15), but the quality of the results was inferior to velvet's for my data set.

ADD COMMENT

Login before adding your answer.

Traffic: 1856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6