Question

How to preform genome annotation?

0

Entering edit mode

7.9 years ago

PK ▴ 130

Dear All, I am new for DNA-Seq. I have sequenced file but my reference file (aspergillus flavus) not that much valid. I want to perform genome annotation. what is the most trustable pipeline (softwares)? please help me

next-gen alignment genome • 2.8k views

ADD COMMENT • link updated 7.9 years ago by Rohit ★ 1.5k • written 7.9 years ago by PK ▴ 130

1

Entering edit mode

https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/

ADD REPLY • link 7.9 years ago by novice ★ 1.1k

0

Entering edit mode

Thank you for your answer. I have a question is it possible to do manually?

ADD REPLY • link 7.9 years ago by PK ▴ 130

0

Entering edit mode

No problem. I don't have any experience in genome annotation so hopefully someone else can chime in.

ADD REPLY • link 7.9 years ago by novice ★ 1.1k

score 2 · Answer 1 · 2017-06-07

2

Entering edit mode

7.9 years ago

Petr Ponomarenko ★ 2.8k

We use http://www.softberry.com/berry.phtml?topic=index&group=programs&subgroup=gfind Also we used MAKER http://www.yandell-lab.org/software/maker.html and Augustus http://bioinf.uni-greifswald.de/augustus/

You may use BUSCO http://busco.ezlab.org to assess quality.

Also it is not that much of the question of the tool selection, rather it is mainly about the way you use it, how you train parameters and your post processing.

ADD COMMENT • link 7.9 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Thank you for your answer. Please explain about training Parameter? you mean software parameters..

ADD REPLY • link 7.9 years ago by PK ▴ 130

0

Entering edit mode

Yes, software parameters, genome annotation software uses different data like nucleotide composition at different features from nearby species and can use additional information from your organism, like transcriptome data. These datasets are used to adjust parameters in different statistical models used to predict different aspects of the genes.

ADD REPLY • link 7.9 years ago by Petr Ponomarenko ★ 2.8k

score 2 · Answer 2 · 2017-06-07

2

Entering edit mode

7.9 years ago

Rohit ★ 1.5k

If you have the support of transcript data, then Genemark is a good option. Else as suggested by Petr, Augustus and Maker would be apt. The braker pipeline is well designed if you have no close species annotated, since augustus trains with your own data.

If you are working with a model organism with other sub-species well annotated, you can give Gemoma a try. Their recent V1.4 looks promising.