Where to upload protein coding gene annotation ?
1
2
Entering edit mode
4.7 years ago
Picasa ▴ 650

Hi all,

I have annotated my genome with the classical combo MAKER + Blast2GO. I am writing now a paper about this genome and this protein coding gene annotation.

What is it usually do in term of protein coding gene annotation upload ? Do people upload it in their github or specific database ?

Thanks for your help.

maker protein • 1.6k views
ADD COMMENT
0
Entering edit mode

You should be able to submit your genome to NCBI. Directions here. I linked eukaryotic genome submission directions since you have used MAKER.

ADD REPLY
0
Entering edit mode

Submitting structural annotations to NCBI is okay, but submitting functional annotations is quite another thing. There is a tool https://genomeannotation.github.io/GAG/ that will help with both, but the functional annotation part is very difficult to go from MAKER annotations to those accepted by NCBI. If I understand you correctly, you used BLAST2GO for functional annotation? Then maybe this might be different. Sorry, I have no experience in that realm if that is the way you went.

ADD REPLY
0
Entering edit mode

Both you and @Juke-34 seem to work a lot with annotation. What is your general experience with NCBI/ENA? Do they check the annotations you submit in some way? Since both are archival databases if the quality of annotation is not great as submitted it is going to keep getting propagated since rarely people seem to go back and make corrections in GenBank. Of the genomes you submitted so far what percentage of the genome(s) was annotated? How complete were your genomes in terms of assemblies?

ADD REPLY
0
Entering edit mode

I wouldn't say I work a lot with annotation- I merely have some experience. I have never used ENA, but I must say that fulfilling annotation requirements set by NCBI may require a few messages back and forth with the curators. As far as I can tell, most people don't submit structural or functional annotations to NCBI. So far all of the genomes I have submitted (public or not yet), I have tried to annotate. I am not really sure how to say "how complete" the genomes were. The dromedary genome assembly was an Illumina assembly scaffolded with Chicago and Hi-C reads with gaps filled in with 11x PacBio coverage. The other genome assemblies (both not yet public- pending publication) were 10x Genomics assemblies.

ADD REPLY
0
Entering edit mode

They do really few check about what you submit. You can find the list of rules from ENA here.

The most important filters for the assembly are:
* No sequence < 20bp
* No sequences starting or ending with Ns.

The most important filter for the annotation is:
* No intron < 10 bp.

The rest is just syntax checkups.

Everything end up in INSDC DBs (ENA, Genbank, DDBJ). NCBI has their own annotation pipeline and can decide at any time to annotate any assembly from Genbank, regardless if an annotation exists or not. At the end when several annotations exist, NCBI provides their own by default. I have seen cases where groups did a better job to annotate their organism but people going to refseq use the NCBI annotation, just because it is the one by default (often we do not realise another annotation exists).

All genomes I submitted to ENA were along with annotation (what is normal because it is part of my job to perform the annotations :) ) The assemblies are always decent ( ~what is expected in term of length, the same in term of completeness (BUSCO at genome level))... in term of fragmentation it is much more heterogeneous, but better and better with the use of long reads.

ADD REPLY
4
Entering edit mode
4.7 years ago
Juke34 8.9k

For the European entry to INSDC use ENA.
For the US door to INSDC use NCBI.

In both cases you must convert your annotation.
For submission to ENA you need to use EMBLmyGFF3.
For submission to NCBI you need to use GAG.

ADD COMMENT

Login before adding your answer.

Traffic: 2287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6