Hi,
I'm trying to align my RNA-seq data to an E coli reference genome that I downloaded from Ensembl bacteria but I'm getting stuck because I need the genome in gtf format. I can only download it in gff3 and cannot convert gff3 to gtf because gffread does not work ("Uncaught exception in exposed API method:").
Does anyone know either how I can align my data to the most current E coli reference genome or convert my gff3 to gtf without the requirement of gffread?
I have previously tried to convert an ensembl gtf to the correct format ("https://usegalaxy.org/u/jeremy/p/transcriptome-analysis-faq") but it gives me a tabular output - can I change this?
Thanks!
GFF3 is supposed to be a backwards compatible specification of the GFF tabular format and GTF, as far as I understand it, is similar to GFF2. So what is your requirement for converting GFF3 to GTF and why is a tabular output not the correct one?
When I use cufflinks with the gff3 file, it provides the correct gene annotation but with incorrect gene names. e.g. instead of the RpoS gene, it gives me "transcript:AAC75783". Also, the cufflinks programme will not recognise my .tab file. Do you know how I can rearrange the gff3 file to identify the gene with their names instead of transcript number?
Could it be because your file doesn't have a .gtf or .gff extension ? If you're using files from Ensembl, those should be in GTF format with the proper .gtf extension. If using GFF3, do you have a gene_name attribute ?
Yes, worked with the gff3 file! Thanks for your help.