Hi! I'm biotechnology undergrad (wet lab) currently doing my master in bioinformatics. Now my project is regarding fungi genome assembly and annotation. I had done genome assembly and now struggling with annotation because I have no prior experience dealing with genomic data. I'm interested to become a bioinformatician during my 2nd year of the degree and fed up doing wet lab so I took this project. My current supervisor also has no experience doing hands-on bioinformatic work. His second option is to send out to annotate if there is no master student took his project. My external co-supervisor barely help me in this project as he also has a lot of bioinformatic related project to deal with.
Anyways, to do genome annotation is it require to have genome assembly data, transcriptomic data and protein data? I only have my genome assembly data. The fungal genome I assembled is new and it was not found in the NCBI database. How can I get transcriptomic data and protein data? Which database and how am I going to BLAST it?
Sorry if the question sounds so simple yet I can't do it or even Google it! This project had stressed me out and I feel like giving up on pursuing this bioinformatic career. Are there any websites/resources that can help me on this path? Thanks for your help!
So you already did the assembly. Let's assume that this is all correctly done.
You can use MAKER2, MAKER: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-491
NCBI also has some tools for it: https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/
You can make your own pipeline, and I believe you need to start with gene prediction: https://en.wikipedia.org/wiki/List_of_gene_prediction_software. After that you can blast the predicted genes
Hi there,
Could you please provide more information about the data
What stage is the genome assembly? e.g. scaffolds or draft genome?
Is it a new species or a new strain of fungi?
Is there an annotated genome of close species/strain available?
Although I am not an expert in fungal bioinformatics, Google searches have returned following links that might be of interest to you.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3207268/
https://github.com/CompSynBioLab-KoreaUniv/FunGAP
fungal genome annotation
What stage is the genome assembly? -the assembly produces only contigs
Is it a new species or a new strain of fungi? -new species which have only few genome strains in database
Is there an annotated genome of close species/strain available? -Yes
My problem is how to find transcriptomic reads as my data is not transcriptomic data
See all the links, AUGUSTUS is an easy tool. They also have a web version http://bioinf.uni-greifswald.de/augustus/submission.php
If this is not a good solution: https://en.wikipedia.org/wiki/List_of_gene_prediction_software
You can use Funannotate - it is flexible on input requirements, only thing necessary is an assembly. Docs are here: http://funannotate.readthedocs.io/en/latest/. Github here: https://github.com/nextgenusfs/funannotate