Implement ensembl gene annotation pipeline for my assembly
2
3
Entering edit mode
6.0 years ago
Mbillah ▴ 140

I'm new in gene annotation, can any one help me to implement ensembl gene annotation pipelines. How can I implement the ensembl gene annotation pipeline for my data? ensembl is web based? is there any linux package? can I implement it on my server or their website? can anyone give any tutorial link?

TIA

gene annotation ensembl • 2.7k views
ADD COMMENT
1
Entering edit mode

It is unclear which data you have and what you aim to obtain. Please elaborate (e.g on file formats) and be specific.

ADD REPLY
0
Entering edit mode

I have paired read, contigs, scaffolds , gff file and now I want to annotate the gene like Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts.

ADD REPLY
1
Entering edit mode

Maybe this repo: https://github.com/Ensembl/ensembl-annotation suits you, but it isn't been finished.

ADD REPLY
1
Entering edit mode

Actually I don't understand how can I start, can you tell me how can I start? can you please explain this command

find . -name '*.p[l|m]' -exec perltidy -pro=perltidyrc -b {} \;

ADD REPLY
0
Entering edit mode

So what you have is an assembly and a gff file. Please change your post to make this more clear.

ADD REPLY
5
Entering edit mode
6.0 years ago
Michael 55k

There is currently no easy way or stream-lined way to install the Ensembl annotation pipeline locally, therefore I do not recommend to even attempt this as a beginner. This doesn't mean it has to stay like this, Ensembl and EBI have been working on a distributed Ensembl infrastructure within Elixir which involves the EBI, Elixir-Norway and Sweden. Possibly, part of the outcome will be a Docker container that runs the whole annotation pipeline with documentation. Have a look at the webinar to see if you might be interested in testing it out anyway. If you want I can try to find out more about the current state of the Ensembl Docker images.

In the meantime I recommend to use the MAKER2 pipeline.

Update:

Unfortunately, it is unlikely that there will be an installable Ensembl annotation pipeline in a Docker container, or otherwise, in the foreseeable future. The efforts towards distributed Ensembl have mainly focussed on the services, like the genome browser and back-end. That means in summary it is only Ensembl that can run the Ensembl annotation pipeline. Also, the Ensembl annotation pipeline relies heavily on Protein evidence, while in your case you might mostly have RNA-seq evidence. For such, MAKER is more suitable.

ADD COMMENT
1
Entering edit mode

Can the maker2 give these? Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts

ADD REPLY
0
Entering edit mode

Possibly not all of them, also what you can get depends on the evidence data you have. Prediction of de-novo non-coding genes is often pretty bad except for tRNA anyway. I recommend you start with a feasible approach first, e.g. looking at the MAKER docs and install some of its dependencies, then predict the protein coding genes and transcripts + tRNA. Don't try to solve everything in one go, getting MAKER to run is tough enough, try bioconda for installing dependencies.

ADD REPLY
0
Entering edit mode
6.0 years ago
EagleEye 7.6k

You may convert your ensembl GTF into gene-based annotation table (tab-delimited). Then you can import this simple table in R or just use linux command-line tools to annotated your results.

Check out this post A: extract only geneID and gene symbol from GTF file

ADD COMMENT

Login before adding your answer.

Traffic: 1893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6