Question about annotation file
1
0
Entering edit mode
4.4 years ago
758104598 • 0

Hi guys! please help me with the annotation files, GFF3. I found the reference genome in NCBI, but they didn't upload the annotation files. Is there any available tools that I can use to create this file, so I can use this to do my RNA-seq analysis. very thanks for your kindly reply!!

genome • 1.2k views
ADD COMMENT
0
Entering edit mode

For GRCh38, they are available HERE. You may have to convert GTF to GFF3.

Excuse my short answer - I am on a call.

ADD REPLY
0
Entering edit mode

This is not for human genome but for some other reference genome.

758104598 : If no annotations were uploaded then you would need to annotate the genome yourself. If the submitted genome file contains annotations (e.g. it is in GenBank format) then you should be able to create a GFF file.

ADD REPLY
0
Entering edit mode

Thanks for your reply. My target genome is a fungal genome, and all I can find is a genome fasta file, but no annotation file. Could you please tell me where can I find the tutorial about genome annotation. and sorry for my wordy question. Hope to hear from you !

ADD REPLY
0
Entering edit mode

Can you share what fungal genome it is, perhaps you are not looking at the right place at NCBI?

There are programs like Maker (LINK) that can be used for annotation.

ADD REPLY
0
Entering edit mode

Thanks. It's an aureobasidium melanogenum genome. Two reference genome existed in NCBI, one has annotation file and another doesn't(the latest one that I want has not been annotated), and the GenBank assembly number is GCA_002156615.1.

ADD REPLY
0
Entering edit mode

There is a genome entry for this organism where annotation is available in GenBank format. You can look at converting that to GFF/GTF.

ADD REPLY
0
Entering edit mode

Finally I converted this file to GFF3 format, but I found that it is not a well-annotated file, lacking of CDS and other material. So I can't find the 9th columns like protein_id, locus_tag, GO etc.

Here is the snapshot of gbff format and gff3 format. it means I can't use it or is there any other methods for converting? Thanks again!

gbff: OMMENT ##Genome-Assembly-Data-START## Assembly Method :: Velvet v. Dec-2016 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 219.7x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## FEATURES Location/Qualifiers source 1..1192554 /organism="Aureobasidium melanogenum" /mol_type="genomic DNA" /submitter_seqid="scaffold10_size1192554" /strain="HN6.2" /isolation_source="offshore surface water in Dongfeng saltworks" /db_xref="taxon:46634" /country="China: Qingdao" /collection_date="2008-06-06" ORIGIN
1 ccctaaccct aaccctaacc ctaaccctaa ccctaaccct aaccctaacc ctaaccctaa 61 ccctaaccct aaccctaacc ctaaccctaa ccctaaccct aaccctaacc ctaaccctaa 121 ccctaatatt ctccctaaac cctaccagag gttagggtcg gggttccctt accctcacct 181 ctacccttac ctcacttcac ctctcccctc tcctctcccc ttcctttccc ctcccctccc 241 ttccctttat tttatgtata ttaaatctaa tctatattgc aaaaattctt cctttgccct

gff3: MWII01000009 GenBank region 1 1255954 . + 1 ID=MWII01000009;Name=MWII01000009;Dbxref=BioProject:PRJNA376057,taxon:46634;Name=MWII01000009;Note=Aureobasidium melanogenum strain HN6.2 scaffold9_size1255954%2C whole genome shotgun sequence.,##Genome-Assembly-Data-START## Assembly Method :: Velvet v. Dec-2016 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 219.7x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## ;collection_date=2008-06-06;comment1=##Genome-Assembly-Data-START## Assembly Method :: Velvet v. Dec-2016 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 219.7x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## ;country=China: Qingdao;date=22-MAY-2017;isolation_source=offshore surface water in Dongfeng saltworks;mol_type=genomic DNA;organism=Aureobasidium melanogenum;strain=HN6.2;submitter_seqid=scaffold9_size1255954

FASTA

MWII01000010 CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAATATTCTCCCTAAACCCTACCAGAGGTTAGGGTCGGGGTTCCCTTACCCTCACCT

ADD REPLY
0
Entering edit mode

It seems that this was formerly called Aureobasidium pullulans? It is possible to construct an annotation database for it in R:

library(AnnotationHub)
hub <- AnnotationHub()
query(hub, 'pullulans')      
ensdb <- hub[['AH81695']]

keytypes(ensdb)
[1] "ACCNUM"   "ALIAS"    "ENTREZID" "GENENAME" "GID"      "PMID"     "REFSEQ"  
[8] "SYMBOL"  

head(keys(ensdb, "SYMBOL"))
[1] "M438DRAFT_10012"  "M438DRAFT_100163" "M438DRAFT_100695" "M438DRAFT_100777"
[5] "M438DRAFT_100786" "M438DRAFT_10091"
ADD REPLY
0
Entering edit mode

Thanks for your help, and I will try to align my transcript to this genome.

ADD REPLY
0
Entering edit mode

Thanks for your reply, but my target genome is a fungal non-annotation genome that I should annotate by myself. I don't know which tools I should use. and Have a nice call!!

ADD REPLY
0
Entering edit mode
4.4 years ago
Shalu Jhanwar ▴ 540

Here is the link to download genome annotations for mammalian vertebrate from NCBI.

Hope this helps!

ADD COMMENT

Login before adding your answer.

Traffic: 3016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6