I have a reference genome ref_genome.fna
I got by downloading via datasets download genome
via its accession number.
GenBank assembly and RefSeq assembly are identical.
When I use RefSeq accession number to find it in RefSeq database, I can see features that were annotated, and those are Gene; mRNA; CDS; ncRNA.
When I go to it's annotation release, link here, I can see that 25,293 genes are protein coding. Likewise, annotation products are available on ftp site, link here.
What I want to do is extract the locations of those 25,293 protein coding genes (alongside any gene identifier) in a single file, eg.
chrom chromStart chromEnd geneID
chr2 100000 155000 RK031
... ... ... ...
For that purpose, what file do I need to download, and what tool do I need to use?
You can simply download the transcript and protein sequences for Astyanax mexicanus from NCBI.
Otherwise you can use
AGAT
(Extracting genomic feature sequences from GTF/GFF files with AGAT ) for extracting this information from genome file.