Gff file error in HTSeq-count
0
0
Entering edit mode
7.5 years ago
y.gladbach • 0

Hi there, I need your help:). I guess my gff file is not correctly formatted, but not sure. Help is much appriciated. So please let me know, if this is true. I get the an error by calling htseq-count:

samtools view -h {} |  htseq-count -m intersection-nonempty -i Name -t ID -s no - piRNA_GRCh37.gff > {.}.count

The error:

Error occured when processing GFF file (line 1 of file /home/gladbach/Documents/genomes/GRCh37/DASHR/piRNA_GRCh37.gff):
Failure parsing GFF attribute line
[Exception type: ValueError, raised in __init__.py:164]

And my gff file:

chr1    ncbi    piRNA   74625   74655   .   +   .   name=piR-61514;gi;108090112;gb;DQ595402.1;ID=piR-61514
chr1    ncbi    piRNA   75435   75464   .   +   .   name=piR-37026;gi;108096889;gb;DQ598960.1;ID=piR-37026
chr1    ncbi    piRNA   135736  135765  .   -   .   name=piR-43123;gi;108056654;gb;DQ575011.1;ID=piR-43123
chr1    ncbi    piRNA   137313  137343  .   -   .   name=piR-43675;gi;108057724;gb;DQ575563.1;ID=piR-43675
htseq count gff • 6.4k views
ADD COMMENT
1
Entering edit mode

Where did you get that GFF file? It's highly malformed.

ADD REPLY
0
Entering edit mode

I got it from here: dashr

It's for piRNA

ADD REPLY
2
Entering edit mode

You might be able to salvage that with sed -e "s/;gi;/;gi=/g; s/;gb;/:gb=/g; s/\;/\; /g" input.gff > modified.gff.

ADD REPLY
0
Entering edit mode

maybe this format would be applicable for the column 9 "name=piR-43675,gi:108057724,gb:DQ575563.1;ID=piR-43675" things are allowed to have multiple values in GFF, it should be comma separated though. The semicolon is reserved as a separator between values. technically name is an official reserved keyword in GFF too so should be "Name=" also

ADD REPLY
0
Entering edit mode

I downloaded now from pirnabank and formatted my file like this (since I trust this website more), would this be correct for gff3 (tab seperated or space? and do I need a header starting with #?):

1       .    piRNA  4493  4520       .  +       . ID=hsa_piR_013426;accession_number=DQ588205
1       .    piRNA  8399  8426       .  +       . ID=hsa_piR_005239;accession_number=DQ577218
1       .    piRNA 16669 16699       .  -       . ID=hsa_piR_016792;accession_number=DQ593109
1       .    piRNA 21997 22023       .  -       . ID=hsa_piR_019669;accession_number=DQ596983
1       .    piRNA 29543 29573       .  -       . ID=hsa_piR_014636;accession_number=DQ590030
1       .    piRNA 41516 41544       .  -       . ID=hsa_piR_016271;accession_number=DQ592181
ADD REPLY
2
Entering edit mode

That looks better and will probably work (perhaps you'll need a space after the semi-colon).

ADD REPLY

Login before adding your answer.

Traffic: 1772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6