Hello,
Curious if anyone have experience using the TSEBRA GTF output in EMV. The GTF file generated by TSEBRA gives error while converting to GFF3 format to be used as an input for EMV. I tried using augustus_GTF_to_EVM_GFF3.pl but it doesn't seems to work. Kindly suggest!
Regards,
B
you should mention what the error is ... it is hard to comment otherwise. What seems to be the error? It is curious that the code errored out GTF is such a simplistic format - closely related to GFF, so conversion is usually trivial. perhaps the GTF file is not quite right?
Istvan Albert Thanks!. I received following error while validating the "augustus_GTF_to_EVM_GFF3.pl" converted gff3 file using EMV utils "gff3_gene_prediction_file_validator.pl"
look at your GTF and try to see what is going on, for example do a
sounds like the feature id needs to be unique. Perhaps need to remove some lines from the file. It is conceivable that both the gene and the transcript have the same IDs
Seems like transcript id and gene id are described multiple times but appears to have same value. Any suggestions on fixing this?
Regards, B
I believe that the problem is that everything has the same ID - what a disaster ... alas not atypical of bioinformatics
You would need to write code that parses the GTF and changes the ids. I think the ChatGPT could do it with ease, let me give it a go, I think it is good start and just tinker with it until works if not most of it should be fine, but just run your own ChatGPT and ask it for refinements
Response:
You can achieve this by using Python and the pandas library to read the GTF file, rename the IDs, and then save the modified GTF file. Here's a script that should do the trick:
This script assumes that the ID format is like this: ID=element1;Parent=element1_parent. If your GTF file has a different format, you will need to modify the script accordingly. Make sure to replace path/to/your/input.gtf and path/to/your/output.gtf with the actual file paths.
Thanks Istvan Albert for suggesting the script! The idea of using python script (accompanied by chatGPT) seems interesting, I will give a try as well. Also, fyi, I solved this issue by simply formatting the gff3 to remove redundant information. EMV seems to expect only parent_ID information in gff3 file.