using gffread for change GFF file to GTF file
0
1
Entering edit mode
6.5 years ago
worarado.kan ▴ 20

Hello everyone,

I try to change GFF file to GTF for analysing in HISAT.

Why when I run following this command;

gffread Blueberry.gff -T -o Blueberry.gtf

Showed this error

GFF Error: overlapping duplicate gene feature (ID=CUFF.71) GFF Error: overlapping duplicate gene feature (ID=CUFF.168) GFF Error: overlapping duplicate gene feature (ID=CUFF.279) GFF Error: overlapping duplicate gene feature (ID=CUFF.330) ....................................

What I am wrong? Please give me the guidance.

Thank you so much

RNA-Seq next-gen sequencing • 5.3k views
ADD COMMENT
0
Entering edit mode

Hello, Please run:

grep -e "CUFF.71" Blueberry.gff

grep -e "CUFF.168" Blueberry.gff

grep -e "CUFF.279" Blueberry.gff

grep -e "CUFF.330" Blueberry.gff

Then paste the output here.

ADD REPLY
0
Entering edit mode

i have the same issue while trying to convert gff3 files to gtf file using gffread. i am new to RNAseq data analysis , i am trying to convert my bam files to raw count so could use DEseq2 to find DEGs, below is the error message after i run the command line,

gffread -E file.gff -T -o file.gtf

"GFF Error: overlapping duplicate mRNA feature (ID=mRNA20293)
GFF Warning: duplicate feature ID gene7319 (1048320-1050627) (discontinuous feature?)
GFF Error: overlapping duplicate mRNA feature (ID=mRNA20294)
GFF Error: overlapping duplicate mRNA feature (ID=mRNA20295)"

"Error parsing value of GFF attribute "Name=", line:
scaffold_6  .   gene    1597619 1612316 .   -   .   ID=gene28515;Name=
Abort trap: 6
"

hope i can get some help.

i did "grep -e "mRNA20294" my file" got run info below:

scaffold_33 .   exon    612943  613025  .   -   .   ID=exon83889;Parent=mRNA20294
scaffold_33 .   exon    613104  613211  .   -   .   ID=exon83890;Parent=mRNA20294
scaffold_33 .   exon    613277  613340  .   -   .   ID=exon83891;Parent=mRNA20294
scaffold_33 .   exon    613404  613561  .   -   .   ID=exon83892;Parent=mRNA20294
scaffold_33 .   exon    613626  613728  .   -   .   ID=exon83893;Parent=mRNA20294
scaffold_33 .   exon    615310  615321  .   -   .   ID=exon83894;Parent=mRNA20294
scaffold_33 .   CDS 612943  613025  .   -   0   ID=CDS83106;Parent=mRNA20294
scaffold_33 .   CDS 613104  613211  .   -   0   ID=CDS83107;Parent=mRNA20294
scaffold_33 .   CDS 613277  613340  .   -   0   ID=CDS83108;Parent=mRNA20294
scaffold_33 .   CDS 613404  613561  .   -   0   ID=CDS83109;Parent=mRNA20294
scaffold_33 .   CDS 613626  613728  .   -   0   ID=CDS83110;Parent=mRNA20294
scaffold_33 .   CDS 615310  615321  .   -   0   ID=CDS83111;Parent=mRNA20294
scaffold_29 .   mRNA    1184259 1185319 .   -   .   ID=mRNA20294;Parent=gene20294;Name=SNAP_00012882
scaffold_29 .   exon    1184259 1184277 .   -   .   ID=exon98427;Parent=mRNA20294
scaffold_29 .   exon    1184412 1184887 .   -   .   ID=exon98428;Parent=mRNA20294
scaffold_29 .   exon    1184942 1185238 .   -   .   ID=exon98429;Parent=mRNA20294
scaffold_29 .   exon    1185305 1185319 .   -   .   ID=exon98430;Parent=mRNA20294
scaffold_29 .   CDS 1184259 1184277 .   -   0   ID=CDS98427;Parent=mRNA20294
scaffold_29 .   CDS 1184412 1184887 .   -   0   ID=CDS98428;Parent=mRNA20294
scaffold_29 .   CDS 1184942 1185238 .   -   0   ID=CDS98429;Parent=mRNA20294
scaffold_29 .   CDS 1185305 1185319 .   -   0   ID=CDS98430;Parent=mRNA20294
scaffold_53 .   mRNA    743863  747177  .   +   .   ID=mRNA20294;Parent=gene18656;Name=fgenesh1_pg.C_scaffold_53000130
scaffold_53 .   exon    743863  743985  .   +   .   ID=exon100811;Parent=mRNA20294
scaffold_53 .   exon    744048  744451  .   +   .   ID=exon100812;Parent=mRNA20294
scaffold_53 .   exon    744540  746113  .   +   .   ID=exon100813;Parent=mRNA20294
scaffold_53 .   exon    746183  747177  .   +   .   ID=exon100814;Parent=mRNA20294
scaffold_53 .   CDS 743863  743985  .   +   0   ID=CDS100063;Parent=mRNA20294
scaffold_53 .   CDS 744048  744451  .   +   0   ID=CDS100064;Parent=mRNA20294
scaffold_53 .   CDS 744540  746113  .   +   0   ID=CDS100065;Parent=mRNA20294
scaffold_53 .   CDS 746183  747177  .   +   0   ID=CDS100066;Parent=mRNA20294

thanks,

ADD REPLY
0
Entering edit mode

Hey, well, just looking at your output, I can see that mRNA20294 is indeed duplicated:

scaffold_29 .   mRNA    1184259 1185319 .   -   .   ID=mRNA20294;Parent=gene20294;Name=SNAP_00012882
...
scaffold_53 .   mRNA    743863  747177  .   +   .   ID=mRNA20294;Parent=gene18656;Name=fgenesh1_pg.C_scaffold_53000130

How did you produce this data?

ADD REPLY
0
Entering edit mode

thanks for the quick reply, i downloaded the reference file from JGI website for Daphnia pulex. The only option is either a .gff file or . gff3 file. The data used downloaded .gff3 file with command line gffread -E my file.gff3 -T -o my file. gtf, which i got nothing in my output. this is the gff3 file looks like :

gff-version 3

scaffold_1 . gene 62188 73952 . + . ID=gene1;Name=gw1.1.25.1 scaffold_1 . mRNA 62188 73952 . + . ID=mRNA1;Parent=gene1;Name=gw1.1.25.1 scaffold_1 . exon 62188 62346 . + . ID=exon1;Parent=mRNA1 scaffold_1 . exon 62647 62736 . + . ID=exon2;Parent=mRNA1 scaffold_1 . exon 67111 67169 . + . ID=exon3;Parent=mRNA1 scaffold_1 . exon 67365 67525 . + . ID=exon4;Parent=mRNA1

ADD REPLY
0
Entering edit mode

but if i use .gff file , scaffold_1 JGI exon 264824 264890 . + . name "estExt_fgenesh1_kg.C_10002"; transcriptId 230065 scaffold_1 JGI CDS 264882 264890 . + 0 name "estExt_fgenesh1_kg.C_10002"; proteinId 230065; exonNumber 1 scaffold_1 JGI start_codon 264882 264884 . + 0 name "estExt_fgenesh1_kg.C_10002" scaffold_1 JGI exon 265102 265158 . + . name

then i got no error for the same command line with gffread, but i still got nothing on my output file. i don't know what should i suppose to do. I used the fasta and gff3 to generate the genome index in STAR and mapped my reads to generate the BAM file, i tried to use HTseq to convert bam file to raw counts so i could do the DEG analysis , i know HTseq except the gtf file, but no such files available, so i am trying to use gffread to generate a gtf file form gff or gff3 file. thanks

ADD REPLY
0
Entering edit mode

i tried gff file with gffread-0.9.12.OSX_x86_64, i got lots of warning on the same command:

Warning: invalid GTF record, transcript_id not found: scaffold_999 JGI CDS 40436 40726 . - 0 name "SNAP_00037760"; proteinId 269702; exonNumber 2 Warning: invalid GTF record, transcript_id not found: scaffold_999 JGI stop_codon 40436 40438 . - 0 name "SNAP_00037760" Warning: invalid GTF record, transcript_id not found: scaffold_999 JGI exon 41056 41220 . - . name "SNAP_00037760"; transcriptId 269702 Warning: invalid GTF record, transcript_id not found: scaffold_999 JGI CDS 41056 41220 . - 0 name "SNAP_00037760"; proteinId 269702; exonNumber 1 Warning: invalid GTF record, transcript_id not found: scaffold_999 JGI start_codon 41218 41220 . - 0 name "SNAP_00037760"

ADD REPLY

Login before adding your answer.

Traffic: 1091 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6