gff3 to bed12
2
3
Entering edit mode
6.2 years ago
Yewon ▴ 30

I have been having challenges converting my gff3 file generated strawberry genome (Fragaria vesca) to a bed12 format which is required for annotating differentially methylated bases. I have read through several solutions offered but have not found the one that works for my data. However, I have come across a github script (https://github.com/pzross/iver/blob/master/R/bioinfo.R) which requires that I download gfftogenepred and genepredtobed12 tools from UCSC and run the scripts in R program inorder to generate the bed12 format. At the point of generating a gfftogenepred file, I get the following error message:

/tmp/tmp.gff:0: empty GFF file, must have header
/tmp/tmp.gff:0: invalid GFF3 header
GFF3: 2 parser errors

My GFF3 file looks fine (with 9 columns)

Please I need help.

Thank you in advance

R gff3 bed12 • 9.6k views
ADD COMMENT
1
Entering edit mode

If you already have the 2 tools from UCSC, did you try them without R?

gff3ToGenePred infile.gff3 temp.genePred
genePredToBed temp.genePred out.bed
ADD REPLY
0
Entering edit mode

michael.ante, I did download the gff3ToGenePred and genePredToBed tools from UCSC through the Anaconda software package. However, when I run the following script in the Anaconda navigator terminal,I get errors. Below is the command I run and excerpts from the start and end of the response:

  1. Start
    1. (wgbs - cpg) brukers - MacBook - Pro - 3: ~bruker%code%nbsp;gff3ToGenePred / Users / bruker / Desktop / CpG\ Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3 out.GP - geneNameAttr = attr - bad = file - maxParseErrors = -50 maxConvertErrors = -50
  2. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_AED
  3. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED
  4. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:3: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI
  5. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_AED
  6. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED
  7. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:7: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI

End 1. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3:405476: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_eAED 2. / Users / bruker / Desktop / CpG Rdata / Fragaria_vesca_v4.0. a1.transcripts.gff3: 405476: invalid attribute tag, must start with an alphabetic character and be composed of alphanumeric, dash, or underscore characters:_QI 3. GFF3: 85764 parser errors

ADD REPLY
0
Entering edit mode

It says, an attribute tag (like ID, Parent, or Name) must start with an alphabetic character. In your gff3's second line the attributes are:

ID=FvH4_c10g00030.1;Parent=FvH4_c10g00030;Name=FvH4_c10g00030.1;_AED=0.87;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80

Thus, _AED is not allowed since it doesn't start with a character. You can run a sed command to change it accordingly:

sed 's/;_/;x_/g' Fragaria_vesca_v4.0.a1.transcripts.gff3 > altered.transcripts.gff3

All attribute tags will then be changed having an x before the underscore.

ADD REPLY
0
Entering edit mode

Interesting,

an attribute tag (like ID, Parent, or Name) must start with an alphabetic character.

gff3ToGenePred introduced peculiarity in the expected gff3 format that does not exist in the official definition of the format.

ADD REPLY
0
Entering edit mode

Maybe it's a requirement for genePred (although not mentioned here)?

ADD REPLY
0
Entering edit mode

michael.ante, I do appreciate your help so far. I was able to introduce an "x" before the underscore. However, I have encountered another challenge in which the converted gff3 file still generates errors. Below is an excerpt of the message:

Command used: gff3ToGenePred - maxParseErrors=50 / Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3 Fragariavesca.GP

parsing error message

/Users/bruker/anaconda2/envs/wgbs-cpg/edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_AED

/ Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_eAED

/ Users / bruker / anaconda2 / envs / wgbs - cpg / edited.transcripts.gff3:4: unknown standard attribute, user defined attributes must start with a lower-case letter:X_QI

I looked into the converted file and realized that the "x" introduced before the underscore was in upper-case despite the fact that I used the lower-case "x". How can I fix this?

I am out of options. Please help

ADD REPLY
0
Entering edit mode

Why not using the command I suggested, inserting a x instead of an X ?

ADD REPLY
0
Entering edit mode

Please provide us few lines of the beginning of your gff3 file.

ADD REPLY
0
Entering edit mode

below is the beginning of the gff3 file

#gff-version 3
contig_10   maker   gene    34303   34545   .   -   .   ID=FvH4_c10g00030;Name=FvH4_c10g00030
contig_10   maker   mRNA    34303   34545   .   -   .   ID=FvH4_c10g00030.1;Parent=FvH4_c10g00030;Name=FvH4_c10g00030.1;_AED=0.87;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    34303   34545   .   -   .   ID=FvH4_c10g00030.1:1;Parent=FvH4_c10g00030.1
contig_10   maker   CDS 34303   34545   .   -   0   ID=FvH4_c10g00030.1:cds;Parent=FvH4_c10g00030.1
contig_10   maker   gene    16709   16951   .   -   .   ID=FvH4_c10g00020;Name=FvH4_c10g00020
contig_10   maker   mRNA    16709   16951   .   -   .   ID=FvH4_c10g00020.1;Parent=FvH4_c10g00020;Name=FvH4_c10g00020.1;_AED=0.88;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    16709   16951   .   -   .   ID=FvH4_c10g00020.1:1;Parent=FvH4_c10g00020.1
contig_10   maker   CDS 16709   16951   .   -   0   ID=FvH4_c10g00020.1:cds;Parent=FvH4_c10g00020.1
contig_10   maker   gene    4883    5125    .   -   .   ID=FvH4_c10g00010;Name=FvH4_c10g00010
contig_10   maker   mRNA    4883    5125    .   -   .   ID=FvH4_c10g00010.1;Parent=FvH4_c10g00010;Name=FvH4_c10g00010.1;_AED=0.88;_eAED=1.00;_QI=0|-1|0|1|-1|1|1|0|80
contig_10   maker   exon    4883    5125    .   -   .   ID=FvH4_c10g00010.1:1;Parent=FvH4_c10g00010.1
contig_10   maker   CDS 4883    5125    .   -   0   ID=FvH4_c10g00010.1:cds;Parent=FvH4_c10g00010.1
###
contig_1    maker   gene    2432    2674    .   +   .   ID=FvH4_c1g00020;Name=FvH4_c1g00020
contig_1    maker   mRNA    2432    2674    .   +   .   ID=FvH4_c1g00020.1;Parent=FvH4_c1g00020;Name=FvH4_c1g00020.1;_AED=0.29;_eAED=0.29;_QI=0|-1|0|1|-1|1|1|0|80
contig_1    maker   exon    2432    2674    .   +   .   ID=FvH4_c1g00020.1:1;Parent=FvH4_c1g00020.1
contig_1    maker   CDS 2432    2674    .   +   0   ID=FvH4_c1g00020.1:cds;Parent=FvH4_c1g00020.1
contig_1    maker   gene    61177   63300   .   +   .   ID=FvH4_c1g00310;Name=FvH4_c1g00310
contig_1    maker   mRNA    61177   63300   .   +   .
ADD REPLY
1
Entering edit mode

According to the specs, the header should start with 2 '#':

The ##gff-version 3 line is required and must be the first line of the file. It introduces the annotation section of the file.

ADD REPLY
0
Entering edit mode

Michael.ante, you are right. I mistakenly omitted one of the #when copying the file. The original file header is like this ##gff-version 3. Thank you for pointing out the error.

ADD REPLY
0
Entering edit mode

zx8754, thanks for editing my gff3 file. it really looks more like the original version.

ADD REPLY
0
Entering edit mode

So, your file looks perfectly fine. It's the most comprehensive gff3 file you can have. Either you don't provide the proper file to your tool (check the path), or the tool expects a particular gff-like file. Maybe the tool doesn't handle the ### and see that like an empty header? You could give a try providing only the first record with the ##gff-version 3 header as well.

ADD REPLY
0
Entering edit mode

If you are using R already, package rtracklayer should be able to do the same.

ADD REPLY
0
Entering edit mode

I did use rtracklayer as one of the packages for this conversion process but the problem arose when I was running a script to create an intermediate genepred file.

ADD REPLY
0
Entering edit mode

Otherwise I have a script in perl that should do the work. It's called gff2bed.pl in the GAAS repository.

ADD REPLY
2
Entering edit mode
6.1 years ago
Jeffin Rockey ★ 1.3k

Alternate method:

Download EA-Utils.

First run gff2gtf like below

gff2gtf file.gff3 >file.gtf

Then run

gtf2bed file.gtf >file.bed

This should produce a bed12 file corresponding to the initial gff3 file

ADD COMMENT
0
Entering edit mode

The second line should be gtf2bed --input=file.gtf >file.bed Thanks!

ADD REPLY
2
Entering edit mode
4.7 years ago
Juke34 8.9k

answers here too A: conversion of GFF3 formate to BED format

ADD COMMENT

Login before adding your answer.

Traffic: 1746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6