Entering edit mode
7.5 years ago
y.gladbach
•
0
Hi there, I need your help:). I guess my gff file is not correctly formatted, but not sure. Help is much appriciated. So please let me know, if this is true. I get the an error by calling htseq-count:
samtools view -h {} | htseq-count -m intersection-nonempty -i Name -t ID -s no - piRNA_GRCh37.gff > {.}.count
The error:
Error occured when processing GFF file (line 1 of file /home/gladbach/Documents/genomes/GRCh37/DASHR/piRNA_GRCh37.gff):
Failure parsing GFF attribute line
[Exception type: ValueError, raised in __init__.py:164]
And my gff file:
chr1 ncbi piRNA 74625 74655 . + . name=piR-61514;gi;108090112;gb;DQ595402.1;ID=piR-61514
chr1 ncbi piRNA 75435 75464 . + . name=piR-37026;gi;108096889;gb;DQ598960.1;ID=piR-37026
chr1 ncbi piRNA 135736 135765 . - . name=piR-43123;gi;108056654;gb;DQ575011.1;ID=piR-43123
chr1 ncbi piRNA 137313 137343 . - . name=piR-43675;gi;108057724;gb;DQ575563.1;ID=piR-43675
Where did you get that GFF file? It's highly malformed.
I got it from here: dashr
It's for piRNA
You might be able to salvage that with
sed -e "s/;gi;/;gi=/g; s/;gb;/:gb=/g; s/\;/\; /g" input.gff > modified.gff
.maybe this format would be applicable for the column 9 "name=piR-43675,gi:108057724,gb:DQ575563.1;ID=piR-43675" things are allowed to have multiple values in GFF, it should be comma separated though. The semicolon is reserved as a separator between values. technically name is an official reserved keyword in GFF too so should be "Name=" also
I downloaded now from pirnabank and formatted my file like this (since I trust this website more), would this be correct for gff3 (tab seperated or space? and do I need a header starting with #?):
That looks better and will probably work (perhaps you'll need a space after the semi-colon).