How To Find Suitable Reference Genome After Genome Assembly
4
3
Entering edit mode
11.3 years ago
HG ★ 1.2k

Hi everyone I have done 20 genome assembly of various strain of listeria monocytogenes**. The strain were collected from different species and different place. Now can any body suggest me how do i find a suitable reference for all 20 genome or else i have to take different reference every time for each genome??? and what will be best approach to find suitable reference genome ???

genome • 5.9k views
ADD COMMENT
0
Entering edit mode

You mean you want to annotate your genomes now?

ADD REPLY
0
Entering edit mode

Actually i did assembly using Spades, now i have scaffolding file so i want to see how this assembly and how much query coverage with reference genome

ADD REPLY
2
Entering edit mode
11.3 years ago
Joseph Hughes ★ 3.0k

You could automate a BLAST your 20 assemblies to known full genomes of Listeria monocytogenes. Use the blast results to work out the coverage of the closest genome in the database.

ADD COMMENT
0
Entering edit mode

I also thought like that but lets assume after assembly each of the genome have different number of contig and when you will blast it it will give you a separate result for each contig ?? so what will be suitable : shall i consider the largest contig of each assembly ??

ADD REPLY
0
Entering edit mode

I would select it based on the total number of bases across all your contigs that are identical to a particular genome in the database.

ADD REPLY
0
Entering edit mode
11.3 years ago

This information: http://bacteria.ensembl.org/info/about/species.html?search=Listeria+monocytogenes

Lists EGD-e in the taxonomic compara. This would be my 1st choice. However the other assemblies may also be useful for quality control mapping

ADD COMMENT
0
Entering edit mode

Could you please elaborate a little bit why you choose it ?? is there any rule ??

ADD REPLY
0
Entering edit mode

Not really, just a generic answer to a generic question. The added utility of this genome is that it has been integrated into the comparative genomics analysis by ensembl

ADD REPLY
0
Entering edit mode

If i want to make a genome analysis pipeline and i have to include it inside a pipe , so in that case how will i select ???

ADD REPLY
0
Entering edit mode
11.3 years ago
Rohit ★ 1.5k

If you are trying to annotate it then for the bacterial genomes its best to use the RAST.

http://rast.nmpdr.org/

ADD COMMENT
0
Entering edit mode

RAST has a problem identify of pseudogene. Moreover its not a reliable source. we have our in house annotation program i think its works better than RAST http://www.ncbi.nlm.nih.gov/pubmed/12682369

ADD REPLY
0
Entering edit mode
11.3 years ago
SRKR ▴ 180

I agree with Joseph, when you blast, you will get to know which genome shows the maximum similarity with all other genomes. That you should take as a reference genome rather than basing it on the size of the contig. Having a huge contig does not necessarily mean it will show maximum similarity with all other genomes.

ADD COMMENT
0
Entering edit mode

I agree with your opinion if it is single genome. But in my case i am doing 20 genome annotation simultaneously which are from different ecology and its a automation program that will take reads directly from machine after that it will do assemble and annotate. So you cant use single reference for all the genome. More over u think if you have around 10 contig for each genome you will get 10 blast result . Assume if two contig are very close in length but base is different : then it will give high identity and query coverage with different genome. Now it will be difficult to chose in a automation program. Hope you understand the problem now.

ADD REPLY

Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6