Entering edit mode
5.0 years ago
arunprasanna83
▴
60
I have gff file generated by braker. It gives the default gene name like the following:
# start gene g1
CC151 AUGUSTUS gene 5487 15014 0.36 - . g1
CC151 AUGUSTUS transcript 5487 15014 0.36 - . g1.t1
CC151 AUGUSTUS terminal 5487 5697 1 - 1 transcript_id "g1.t1"; gene_id "g1";
CC151 AUGUSTUS intron 6385 6467 1 - 2 transcript_id "g1.t1"; gene_id "g1";
CC151 AUGUSTUS intron 6550 6622 1 - 0 transcript_id "g1.t1"; gene_id "g1";
CC151 AUGUSTUS intron 6714 6854 1 - 0 transcript_id "g1.t1"; gene_id "g1";
CC151 AUGUSTUS CDS 6998 7110 1 - 2 transcript_id "g1.t1"; gene_id "g1";
CC151 AUGUSTUS CDS 7888 7941 1 - 0 transcript_id "g1.t1"; gene_id "g1";
# end gene g1
I wish to change the default g1 to custom name for example "CC151_gene1". I tried to create a list of all gene ids and corresponding replace texts and tried the following:
g1 CC151_gene1
g2 CC151_gene2
grep -f gene.replacement.txt mygfffile.gff > replaced.gfffile.gff
However, my original file was not modified. Can anyone suggest a better method ?
Thanks in advance.
Hi Juke, Thanks for suggesting AGAT. It does work, but the problem is the naming is too long. For instance, the total number of genes i have is 28540 but I get a gene name like M000000000001. this is close to 12 places.
Ya I implemented like that to follow what does Ensembl. What you could do now that your file is standardized by AGAT, is to use
agat_sq_manage_ID.pl
(Do not use this script with your original file because it expects a properly formatted gff3. All script with sq prefix need a proper gff3 file )I have fixed it in AGAT version 0.1.0