What to do after an genome assembly?
8
4
Entering edit mode
10.0 years ago
ol_ucla ▴ 40

Hi,

What can I do with an assembled genome? that is the main question I want to know.

A little more detail:

Before I have experience with RNAseq alignment, and now I'm learning genome assembly. Currently I am learning SOAPdenovo for assembly, and Valvet and MetaValvet afterwards. The question is: sure, I got a genome assembled, and let's assume I am satisfied with the gap-closing process and the final assembly. What's next?

I read that you can do genome annotation, and gene prediction. Anything else?

A little side-track: What would be a good tool to use for annotation and gene prediction?

Thanks in advance!

next-gen Assembly RNA-Seq • 5.7k views
ADD COMMENT
6
Entering edit mode

Celebrate :)

ADD REPLY
2
Entering edit mode
10.0 years ago
Michael 55k

A good tool chain for annotation is MAKER.

ADD COMMENT
2
Entering edit mode
10.0 years ago

I can ensure you that the N50 value alone is not enough to figure out if your assembly has been good or not.

You can run velvet with different values of kmer using a simple genome (E.coli), and then, you can compare the various kmer-assembled genomes with other trusted genomes using Mauve or Act, and you will be greatly surprised. In my hands, some assembled genomes with higher N50 ended to be worse that other assemblies with lower N50

You don't mention what organism have you used for the assembly. If trusted genomes are there for evaluate, use them before going further

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

The fact is, I don't even know what organism I'm working on besides it's a bacteria. The genome is very small because each time it only takes about 10 mins for the pair-end sample to be assembled.

Right now, the other factors I take into consideration are the number of contigs and max length of contig.

The person training me said that the number of contig could be a good indication, as less contig usually means better assembly. But of course, this does not say anything about it being assembled correctly. In your opinion, can number of contig or max length of contig be an evalutaion element?

Mauve, I've used that, and didn't think of it as a tool for genome validation. But now you mentioned it, it makes perfect sense to use it!

ADD REPLY
0
Entering edit mode

N50 takes into account the number of contigs and max length of contigs. N50 is conceptually very easy to understand once you get it to the point, but it is hard to explain. In my hands, I got better assemblies with E.coli with lower N50 values using velvet

I would be sort of worried with a first copy of my draft genome in case that I cannot compare it with real trusted genomes. If this is would be the case, I certainly would design a strong pipeline with my reads including mate paired and long pacific bioscience sequences

A colleague of mine need 7 years to close a Pseudomonas genome ...

ADD REPLY
1
Entering edit mode
10.0 years ago
iraun 6.2k

For the genome annotation I would suggest you to give a try to Blast2GO.

For other hand, you can do a gene prediction analysis doing a blast, against protein database or something like that...

But, in the first place, as has been said, you should celebrate ^^

Hope it helps.

ADD COMMENT
0
Entering edit mode
10.0 years ago
sentausa ▴ 650

Compare it to something(s) else.

ADD COMMENT
0
Entering edit mode
10.0 years ago
5heikki 11k

Check your original plans for why you decided to sequence in the first place? What were the research questions?

ADD COMMENT
0
Entering edit mode

Maybe I should make it more clear at the first place.

The data set I use right now has already been analyzed, and it has been already done for what it was originally for. I am just re-using the data so I can learn about genome assembly, the idea and tools that can be used. So I guess I answered the reserach question along the way already?

I am posting to know what could be done next, or myabe I should rephrase the question to "What can you do with an assembled genome?"

Thanks.

ADD REPLY
0
Entering edit mode
10.0 years ago
ol_ucla ▴ 40

Also, I read a bit more about genome assembly after the original post.

I found that, even though N50 is widely used, or used as a standard, to choose the Kmer size, it doesn't indicate if the genome is assembled correctly. So genome validation becomes an issue.

Following that thought, I found that there are tools for validation, and I only found one: REAPR

Anyone has any experience with that? the manual is a bit hard to follow for some reasons.

Any other tools recommended for assembly validation?

Thanks.

ADD COMMENT
0
Entering edit mode
10.0 years ago
dago ★ 2.8k

Are you dealing with prokaryote or eukaryote?

Gene annotaiton for prokaryote is well done by PROKKA that uses PRODIGAL for the prediction of CDS.

ADD COMMENT
0
Entering edit mode
10.0 years ago

If your organism belongs to prokaryotes, like mentioned above Prodigal can used for gene prediction. Apart from that, GeneMarkS would also be a good tool.

ADD COMMENT

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6