Getting a gff file from NCBI
2
1
Entering edit mode
4.1 years ago

I am looking for the gff file for the chloroplast of Tetradesmus obliquus strain DOE0152z. Why is it so hard to find on NCBI? Can somebody point me in the right direction please?

gff ncbi • 2.5k views
ADD COMMENT
0
Entering edit mode

https://www.ncbi.nlm.nih.gov/nuccore/NEDT01000001.1

I don't see how I can get the gff file

ADD REPLY
0
Entering edit mode

GFF files may only be available if the genome was annotated by NCBI.

That said this genome is for the organism you mention above but is likely not the same strain. Only GenBank format annotation seems to be available for this genome. You could convert that to a GFF file.

ADD REPLY
0
Entering edit mode

This is the same strain. How can I convert it to a gff file?

ADD REPLY
0
0
Entering edit mode

If it is those genomes that you are interested in, neither of them are annotated by the genome sequence submitter. In that situation, what do you expect to see in the GFF3 file? Unfortunately, there is no RefSeq genome assembly for this organism, so RefSeq did not annotate this genome.

ADD REPLY
0
Entering edit mode

I have some variants I want to annotate using snpEff in the chloroplast genome. I need a gff file.

ADD REPLY
1
Entering edit mode
4.1 years ago
vkkodali_ncbi ★ 3.8k

The GFF3 file format is used to represent annotation of the genome - like where the genes, transcripts, exons, etc are located in the genome. For this particular organism, there are two genome assemblies available at NCBI: GCA_900108755.1 and GCA_002149895.1. The submitters of these genome assemblies did not provide any annotation, so there is no data to create a GFF3 file. While RefSeq genome assemblies (with GCF_ accession prefix) are always annotated, unfortunately, there is no RefSeq genome assembly for this organism. Perhaps an external organization or lab annotated the genome of this organism in which case you can download the GFF3 file from there but I do not know of any.

ADD COMMENT
0
Entering edit mode

vkkodali, I thnk you should add your comment as the answer. You are right about the RefSeq annotation. I also checked the Tetradesmus obliquus strain DOE0152z FTP in NCBI and agree that there is no GFF file for it.

ADD REPLY
0
Entering edit mode
4.1 years ago
Arsenal ▴ 160

Previous comments are right about the criteria expected for an assembly have certain complementary data like GFF.

BUT

I think you should check the snpEff documentation (https://pcingola.github.io/SnpEff/se_introduction/), because the input files are not regular GFF files, but actually VCF and/or BED.

ADD COMMENT

Login before adding your answer.

Traffic: 1801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6