I am trying to use gffcompare to compare my assembled transcriptome to a reference gtf that contains information about small open reading frames (sORFs). The reference gtf was obtained by processing downloaded data from several sORF databases and is called all_38.gtf
.
I used the command: gffcompare -r path/to/all_38.gtf -o /output_folder my_transcriptome.gtf
mytranscriptome.gtf
is generated by assembling my Bam files with reference to ensembl hg38 v99 reference.
Upon completion, the output of the error message log file reads:
0 reference transcripts loaded.
237788 query transfrags loaded.
2714 duplicate query transfrags discarded.
The expected output files are generated except for the .refmap file which I need for downstream analysis. I assume this is because 0 reference transcripts were loaded into the program. Has anyone encountered similar issues? And is possibly something wrong with my custom reference gtf file?
The reference gtf file can be found attached to this GitHub issue.
Give us a sample of your gtf file.
The gtf used as the reference can be found here.
Let me know if the link doesn't work.
The first few lines of my input my_transcriptome.gtf are:
Your problem is most likely related to the sequence identifiers (1st column) from your file that do not match with any of the sequence identifiers from the reference.
Sorry, I realised I pasted an older version of my_transcriptome.gtf.
The updated gtf looks like this:
The chromosome identifier should be the same now. I ran gffcompare with the old and new version of my_transcriptome.gtf but run into the same errors.