Hello
Background I'm trying to analyse a RNA-seq experiment of Bacillus subtilis PY79. As part of that, I need to create an ensemble database using ensembldb (https://bioconductor.org/packages/release/bioc/html/ensembldb.html). For this I need a gff file of the genome. I tried to download it from NCBI, however, I get an error because that gff file lacks "gene_id". Since I cannot find any other gff file of that subspecies, I am now trying to generate it from the gb file.
The Problem I have a genebank (gb) file which I have downloaded from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/NC_022898.1?report=genbank then send to -> file -> GeneBank (full)). I wish to convert it to a gff3 file. I have attempted several things, but no succeeded.
What I've Tried
- I tried the julia package BioJulia/GenomicAnnotations.jl, but run into an error (https://github.com/BioJulia/GenomicAnnotations.jl/issues/7). I have searched earlier cases of the same query (https://www.biostars.org/p/134589/).
- I have managed to download bp_genbank2gff3.pl, but found no instructions that helped me to run it properly.
- I have tried to find gff tools. I go to https://bmi.inf.ethz.ch/supplements/gff-tools, but only find a link to a place where I can run it online, but I'm not actually able to add any input to the online interface.
The GFF file for this strain does have
gene
identifier. You should be able to use that for your counting usingfeatureCounts
. This is bacterial RNAseq so things are simpler. Align with aligner of your choice and then usefeatureCounts
with-g gene
option. If you choose this file then be sure to get the corresponding genome fasta file to create your indexes. That way all identifiers will match.