Entering edit mode
4.4 years ago
758104598
•
0
Hi guys! please help me with the annotation files, GFF3. I found the reference genome in NCBI, but they didn't upload the annotation files. Is there any available tools that I can use to create this file, so I can use this to do my RNA-seq analysis. very thanks for your kindly reply!!
For GRCh38, they are available HERE. You may have to convert GTF to GFF3.
Excuse my short answer - I am on a call.
This is not for human genome but for some other reference genome.
758104598 : If no annotations were uploaded then you would need to annotate the genome yourself. If the submitted genome file contains annotations (e.g. it is in GenBank format) then you should be able to create a GFF file.
Thanks for your reply. My target genome is a fungal genome, and all I can find is a genome fasta file, but no annotation file. Could you please tell me where can I find the tutorial about genome annotation. and sorry for my wordy question. Hope to hear from you !
Can you share what fungal genome it is, perhaps you are not looking at the right place at NCBI?
There are programs like
Maker
(LINK) that can be used for annotation.Thanks. It's an aureobasidium melanogenum genome. Two reference genome existed in NCBI, one has annotation file and another doesn't(the latest one that I want has not been annotated), and the GenBank assembly number is GCA_002156615.1.
There is a genome entry for this organism where annotation is available in GenBank format. You can look at converting that to GFF/GTF.
Finally I converted this file to GFF3 format, but I found that it is not a well-annotated file, lacking of CDS and other material. So I can't find the 9th columns like protein_id, locus_tag, GO etc.
Here is the snapshot of gbff format and gff3 format. it means I can't use it or is there any other methods for converting? Thanks again!
gbff: OMMENT ##Genome-Assembly-Data-START## Assembly Method :: Velvet v. Dec-2016 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 219.7x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## FEATURES Location/Qualifiers source 1..1192554 /organism="Aureobasidium melanogenum" /mol_type="genomic DNA" /submitter_seqid="scaffold10_size1192554" /strain="HN6.2" /isolation_source="offshore surface water in Dongfeng saltworks" /db_xref="taxon:46634" /country="China: Qingdao" /collection_date="2008-06-06" ORIGIN
1 ccctaaccct aaccctaacc ctaaccctaa ccctaaccct aaccctaacc ctaaccctaa 61 ccctaaccct aaccctaacc ctaaccctaa ccctaaccct aaccctaacc ctaaccctaa 121 ccctaatatt ctccctaaac cctaccagag gttagggtcg gggttccctt accctcacct 181 ctacccttac ctcacttcac ctctcccctc tcctctcccc ttcctttccc ctcccctccc 241 ttccctttat tttatgtata ttaaatctaa tctatattgc aaaaattctt cctttgccct
gff3: MWII01000009 GenBank region 1 1255954 . + 1 ID=MWII01000009;Name=MWII01000009;Dbxref=BioProject:PRJNA376057,taxon:46634;Name=MWII01000009;Note=Aureobasidium melanogenum strain HN6.2 scaffold9_size1255954%2C whole genome shotgun sequence.,##Genome-Assembly-Data-START## Assembly Method :: Velvet v. Dec-2016 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 219.7x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## ;collection_date=2008-06-06;comment1=##Genome-Assembly-Data-START## Assembly Method :: Velvet v. Dec-2016 Genome Representation :: Full Expected Final Version :: Yes Genome Coverage :: 219.7x Sequencing Technology :: Illumina ##Genome-Assembly-Data-END## ;country=China: Qingdao;date=22-MAY-2017;isolation_source=offshore surface water in Dongfeng saltworks;mol_type=genomic DNA;organism=Aureobasidium melanogenum;strain=HN6.2;submitter_seqid=scaffold9_size1255954
FASTA
It seems that this was formerly called Aureobasidium pullulans? It is possible to construct an annotation database for it in R:
Thanks for your help, and I will try to align my transcript to this genome.
Thanks for your reply, but my target genome is a fungal non-annotation genome that I should annotate by myself. I don't know which tools I should use. and Have a nice call!!