Hi! I'm trying to extract exon and splice site information from the gff file (which I downloaded from NCBI: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000214255.1_Bter_1.0/GCF_000214255.1_Bter_1.0_genomic.gff.gz) using the python scripts provided in HISAT2 package extract_splice_sites.py and extract_exons.py) for downstream analysis starting with HISAT2.
The scripts work fine for the example gtf file which I downloaded as supplementary file from the Nature Protocols article ( http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html ) but when I use these on the gff file the scripts run without returning any error but the output files are empty.
I guess it is because of the gtf fromat (although I see that the information which should be extracted from the gtf file is present in the same columns also in my gff file so I thought the scripts might work as well on the gff file). I simply tried to rename the *.gff to *. gtf, but the python scripts outputed again empty files.
I'll be thankfull for any suggestions on how to extract the exon and splice site information from the gff file!
Thanks! I used gffread to convert my gff to gtf and the new gtf was processed by the scripts without any apparent problem and the output files containing extracted exon and splice site information look reasonable.
Good to know that it worked for you.
Jf