modify gff file
1
1
Entering edit mode
6.3 years ago

Hi I want to modify my GFF file

Right now it is in form

Ca_LG_1 EVM gene    41278   42503   .   -   .   ID=Ca_00005;
Ca_LG_1 EVM mRNA    41278   42503   .   -   .   ID=Ca_00005.1;Parent=Ca_00005;
Ca_LG_1 EVM exon    42292   42503   .   -   .   ID=Ca_00005.1.exon1;Parent=Ca_00005.1;
Ca_LG_1 EVM CDS 42292   42503   .   -   0   ID=Ca_00005.1.cds1;Parent=Ca_00005.1;
Ca_LG_1 EVM exon    41379   41745   .   -   .   ID=Ca_00005.1.exon2;Parent=Ca_00005.1;
Ca_LG_1 EVM CDS 41379   41745   .   -   2   ID=Ca_00005.1.cds2;Parent=Ca_00005.1;
Ca_LG_1 EVM exon    41278   41304   .   -   .   ID=Ca_00005.1.exon3;Parent=Ca_00005.1;
Ca_LG_1 EVM CDS 41278   41304   .   -   0   ID=Ca_00005.1.cds3;Parent=Ca_00005.1;
Ca_LG_1 EVM gene    71881   72641   .   +   .   ID=Ca_00006;
Ca_LG_1 EVM mRNA    71881   72641   .   +   .   ID=Ca_00006.1;Parent=Ca_00006;
Ca_LG_1 EVM five_prime_UTR  71881   71905   .   +   .   ID=Ca_00006.1.utr5p1;Parent=Ca_00006.1;
Ca_LG_1 EVM exon    71881   72641   .   +   .   ID=Ca_00006.1.exon1;Parent=Ca_00006.1;
Ca_LG_1 EVM CDS 71906   72481   .   +   0   ID=Ca_00006.1.cds1;Parent=Ca_00006.1;
Ca_LG_1 EVM three_prime_UTR 72482   72641   .   +   .   ID=Ca_00006.1.utr3p1;Parent=Ca_00006.1;
Ca_LG_1 EVM gene    73915   74216   .   -   .   ID=Ca_00007;
Ca_LG_1 EVM mRNA    73915   74216   .   -   .   ID=Ca_00007.1;Parent=Ca_00007;
Ca_LG_1 EVM exon    74113   74216   .   -   .   ID=Ca_00007.1.exon1;Parent=Ca_00007.1;
Ca_LG_1 EVM CDS 74113   74216   .   -   0   ID=Ca_00007.1.cds1;Parent=Ca_00007.1;
Ca_LG_1 EVM exon    73915   74008   .   -   .   ID=Ca_00007.1.exon2;Parent=Ca_00007.1;
Ca_LG_1 EVM CDS 73915   74008   .   -   2   ID=Ca_00007.1.cds2;Parent=Ca_00007.1;

and I want

1   araport11   gene    3631    5899    .   +   .   gene_id "AT1G01010"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding";
1   araport11   transcript  3631    5899    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding";
1   araport11   exon    3631    3913    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon1";
1   araport11   CDS 3760    3913    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   start_codon 3760    3762    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding";
1   araport11   exon    3996    4276    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon2";
1   araport11   CDS 3996    4276    .   +   2   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   exon    4486    4605    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon3";
1   araport11   CDS 4486    4605    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "3"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   exon    4706    5095    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon4";
1   araport11   CDS 4706    5095    .   +   0   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "4"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; protein_id "AT1G01010.1"; protein_version "1";
1   araport11   exon    5174    5326    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "5"; gene_name "NAC001"; gene_source "araport11"; gene_biotype "protein_coding"; transcript_source "araport11"; transcript_biotype "protein_coding"; exon_id "AT1G01010.1.exon5";

in this format

I am new to bioinformatics kindly help

genome • 2.4k views
ADD COMMENT
0
Entering edit mode

Hello manishbiotechie,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

fin swimmer

ADD REPLY
0
Entering edit mode

For future reference, please try to use more descriptive titles (e.g. the nature of the “modification” you want to make and so on).

ADD REPLY
2
Entering edit mode
6.3 years ago

Hello manishbiotechie,

it looks like you're trying to convert gff to gtf. To get the exact output you gave in your example there are not enough information in your gff.

You can try gffread for conversion:

$ gffread input.gff -T -o output.gtf

This will give you this output:

Ca_LG_1 EVM exon    41278   41304   .   -   .   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM exon    41379   41745   .   -   .   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM exon    42292   42503   .   -   .   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM CDS 41278   41304   .   -   0   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM CDS 41379   41745   .   -   1   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM CDS 42292   42503   .   -   0   transcript_id "Ca_00005.1"; gene_id "Ca_00005";
Ca_LG_1 EVM exon    71881   72641   .   +   .   transcript_id "Ca_00006.1"; gene_id "Ca_00006";
Ca_LG_1 EVM CDS 71906   72481   .   +   0   transcript_id "Ca_00006.1"; gene_id "Ca_00006";
Ca_LG_1 EVM exon    73915   74008   .   -   .   transcript_id "Ca_00007.1"; gene_id "Ca_00007";
Ca_LG_1 EVM exon    74113   74216   .   -   .   transcript_id "Ca_00007.1"; gene_id "Ca_00007";
Ca_LG_1 EVM CDS 73915   74008   .   -   1   transcript_id "Ca_00007.1"; gene_id "Ca_00007";
Ca_LG_1 EVM CDS 74113   74216   .   -   0   transcript_id "Ca_00007.1"; gene_id "Ca_00007";

Otherwise have a look at the place where you get your gff if there is also an gtf available.

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 1124 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6