Entering edit mode
3.5 years ago
arko.bmb.du
▴
20
Hello, I am new to bioinformatics. How can I build a gff/gtf file of my genome of interest? (I have sequence data in fasta format, headers doesn't contain any annotational information)
What species are you studying for your research? It would be good to let me know it because the way might be different depending on it.
Thanks a lot for your response. I am working with a fish species named Tenualosa ilisha. No gtf/gff file for this species is available online. Only genome assembly data (at scaffold level) is available.
As you probabily know a research on ilisha better than me, I also found that the draft genome assembly of Tenualosa ilisha has been recently reported (Mohindra et al., Scientific Reports, 2019). After my efforts to find the information for the annotated genes of ilisha, I also failed to find any of them. There is no available annotation information so far because it seems that studies on annotation genes of ilisha are in very early stage.
For an alternative way of it, I think that it would be good to look at the method section of research papers on ilisha. Have you tried in this way?
Okay. I will go through the method section again. Thanks.
For example, if you are studying human genome and searching for corresponding annotation to hg38 version, you can get the annotation of it through UCSC genome browser.
Step 1. Go to UCSC genome browser
Step 2. Follow links below:
In top navigation bar, Click 'Downloads' menu -> Genome data -> Human -> Dec. 2013 (GRCh38/hg38) -> Genome sequence files and select annotations (2bit, GTF, GC-content, etc) -> Standard genome sequence files and select annotations (2bit, GTF, GC-content, etc)
Then, it will take you to this page. And if you select the 'genes' folder, you can find several annotation files for human.
Step 3. Download a relevant annotation file in GTF format you need for your research.
Oh, I misunderstood your question. You are asking not 'how to find' but 'how to build' annotation of your genome of interest. Then, I think that what you need at this step is an alignment of the sequence data you have to the ilisha genome/scaffold.