Hi there,
I've been trying to analyze Brassica napus transcriptomic data for the purpose of isoform expression and incidence of splicing events which led me to use the Brassica Database GFF3 and fasta files for my index generation (STAR).
After a few errors I managed to get my STAR run working but subsequent software (e.g. rMATS require gtf files and the BRAD GFF3 doesn't seem to be compatible with any GFF3->gtf software.
(I've used gffread and genometools so far).
Has anyone had similar problems with the formatting of these BRAD annotation files?
Example formatting:
chrC03 GazeA2 mRNA 28541218 28543845 572.4227 + . ID=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 UTR 28543523 28543845 6.0158 + . Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 CDS 28543454 28543522 29.9339 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 CDS 28543158 28543369 27.5481 + 1 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
chrC03 GazeA2 CDS 28542958 28543060 27.3743 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001
Columns 1-8 are mostly consistent with sample GFF3 files but I've noticed a large space in the mRNA row between the score and strand columns. Also, the attribute column is different but I don't know if this is an acceptable departure from the norm.
I managed to get around this problem in STAR through: STAR --runMode genomeGenerate --genomeDir $1 --genomeFastaFiles $genfas --sjdbOverhang 99 --sjdbGTFfile $gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS
Which seems to be correct, and following map job was successful.
Does anyone have any ideas as what could be causing this problem and/or any potential solutions?
Thanks in advance, I've been really wracking my brain.
Probably because there is no gene feature. I guess the converters expect those features.
I had so many issues during the past years with the different gff3 files you can find everywhere. There is often something missing for the tools that use gff3 as input. So, I decided to write a parser that works with any kind of gff (gff, gff2, and all gff3 flavours) and gtf too, which checks, completes, corrects the input file in order to create complete and standardized gff3 files. Most of my tools using gff3 files pass first by this parser.
If you want to have a try you can find the toolkit call AGAT here:
https://github.com/NBISweden/AGAT.git
To install it do:
Then to use the parser, the simplest way is to use this script:
Plenty of other scripts are available... do agat_ and try autocompletion to see all of them.
Hi Jake, I have this gff output file from AUGUSTUS, through BRAKER, but it doesn't seem to conform to the standard file format. i want to rewrite the attribute column. can any of your scripts do this. thanks Kay
Using the
agat_convert_sp_gxf2gxf.pl
you will end up with a full and standardized gff3 file.It deals well with the weird Augustus output.
If you wish to manipulate the attributes in a specific way you can have a try to the script called:
agat_sp_manage_attributes.pl
The "parent" is missing, probably a gene, as @Juke-34 says.