Rename GFF3 Sequence Names
1
0
Entering edit mode
18 months ago
pthom010 ▴ 40

I have a GFF3 file that looks like this:

X_Chr1  maker   exon    225515  226772  .   -   .   ID=X-6_Chr1v1_00045.1:13;Parent=X_Chr1v1_00045.1
X_Chr1  maker   exon    227294  227414  .   -   .   ID=X-6_Chr1v1_00045.1:12;Parent=X_Chr1v1_00045.1
X_Chr1  maker   exon    227583  227973  .   -   .   ID=X-6_Chr1v1_00045.1:11;Parent=X_Chr1v1_00045.1
X_Chr1  maker   exon    228164  228232  .   -   .   ID=X-6_Chr1v1_00045.1:10;Parent=X_Chr1v1_00045.1

I would like to take the ID value (ID = X-6_Chr1v10045.1) for each of these genes and make tat the new name of the gene (the first column). Would anybody be able to point me to a package or code that is capable of doing so?

gff3 • 1.2k views
ADD COMMENT
0
Entering edit mode

Please edit your post and use an accurate title: You're renaming sequence names in a GFF file and modifying content, not renaming a GFF3 file. The former is a bioinformatics problem, the latter is a computer science basic task.

ADD REPLY
0
Entering edit mode

AGAT contains a script (agat_sp_manage_IDs.pl) to manage your IDs. I don't know if it can do exactly what you want but it may help you.

ADD REPLY
1
Entering edit mode
18 months ago

You could use sed to do this.

sed -E 's/(^\S+)(.+ID=)([^:]+)(:.+)/\3\2\3\4/' in.gff3 > out.gff3
X-6_Chr1v1_00045.1  maker   exon    225515  226772  .   -   .   ID=X-6_Chr1v1_00045.1:13;Parent=X_Chr1v1_00045.1
X-6_Chr1v1_00045.1  maker   exon    227294  227414  .   -   .   ID=X-6_Chr1v1_00045.1:12;Parent=X_Chr1v1_00045.1
X-6_Chr1v1_00045.1  maker   exon    227583  227973  .   -   .   ID=X-6_Chr1v1_00045.1:11;Parent=X_Chr1v1_00045.1
X-6_Chr1v1_00045.1  maker   exon    228164  228232  .   -   .   ID=X-6_Chr1v1_00045.1:10;Parent=X_Chr1v1_00045.1
ADD COMMENT

Login before adding your answer.

Traffic: 1515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6