Tabix With Gff3
3
0
Entering edit mode
10.8 years ago

I'm very new in BioInformatics (VERY) - please someone help with this:

first I run this command:

bgzip mydata.gff3

Then this is created:

data.gff3.gz

Then I run:

tabix -p gff data.gff3.gz

ths is the error I get

[get_intv] the following line cannot be parsed and skipped: 
[ti_index_core] the indexes overlap or are out of bounds

I know there is nothing wrong with my data (it is used in my application elsewhere)

Can someone point me in the right direction to solve this?

tabix gff gff3 • 12k views
ADD COMMENT
0
Entering edit mode

Hey guys!

Sorry about the simplicity of the question! But I'm trying for ours to open a gff in artemis to check some gene models and I can not!! The fasta and sorted bam file work well, but I can not get the gff together at all!!

So, I tried to sort and index it with tabix, but its not working!

The sort with the line bellow seems to go fine

(grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz

But then, When I try

tabix -p sorted.gff.gz

It does not work! It says:

[E::get_intv] failed to parse TBX_GENERIC, was wrong -p [type] used? The offending line was: "itr6_2569_ AUGUSTUS CDS 1416 1422 . + 0 ID=itr6_2569_pilon_pilon_pilon_pi.g1.t1.cds;Parent=itr6_2569_pilon_pilon_pilon_pi.g1.t1" Segmentation fault (core dumped)

My gff is a PASA update output:

file: # original

itr6_6049_ AUGUSTUS gene 18202 46612 . - . ID=itr6_6049_pi.g765;Name=itr6_6049_pi.g765.t1

itr6_6049_ AUGUSTUS mRNA 18202 46612 . - . ID=itr6_6049_pi.g765.t1;Parent=itr6_6049_pi.g765;Name=itr6_6049_ pi.g765.t1

itr6_6049_ AUGUSTUS exon 46565 46612 . - . ID=itr6_6049_pi.g765.t1.exon1;Parent=itr6_6049_pi.g765.t1

I would appreciate any help.. This is really annoying me because its seems simples, I know =(

Thank you!

ADD REPLY
0
Entering edit mode

this is a new question , not a reply. Open a new question please

ADD REPLY
0
Entering edit mode

Oh sorry, Pierre! I'll do that!

ADD REPLY
0
Entering edit mode

You should use "tabix -p gff filename.gff.gz" not just "tabix -p filename.gff.gz"

ADD REPLY
2
Entering edit mode
10.8 years ago

check that your GFF file is:

  • sorted on chrom/start
  • doesn't contain a blank line.
ADD COMMENT
0
Entering edit mode

thanks, but now there is one error: [ti_index_core] the file out of order at line 19

this is line 19 : ctgA example SNP 1000 1000 0.987 . . ID=FakeSNP1;Name=FakeSNP;Note=This is a fake SNP that should appear at 1000 with length 1

ADD REPLY
0
Entering edit mode

your file is not sorted.

ADD REPLY
0
Entering edit mode

alright - thanks, any direction you can give me with sorting?

ADD REPLY
1
Entering edit mode

There's a note in the tabix documentation for sorting a gff (reproduced below, but I needed to separate the bgzip step separately)

(grep ^"#" in.gff3; grep -v ^"#" in.gff3 | sort -k1,1 -k4,4n) > out.sorted.gff3
bgzip out.sorted.gff3
ADD REPLY
1
Entering edit mode

Another method to sort gff is to use "genometools" and run

gt gff3 -sortlines input.gff > output.gff

This command also has a useful -tidy option that can clean up and validate very messy GFFs. It will rename the ID column though, unless -retainids is used

ADD REPLY
0
Entering edit mode
2.4 years ago
cmdcolin ★ 4.0k

Another command I found useful to sort gff using awk

awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -t\"\t\" -k1,1 -k4,4n -k5,5n"}' file.gff > file.sorted.gff

This is similar to the method from http://www.htslib.org/doc/tabix.html but it properly sets the tab delimiter on the sort command and avoids subshells

ADD COMMENT
0
Entering edit mode

avoids subshells

well, there is a system call....

ADD REPLY
0
Entering edit mode
2.4 years ago
Juke34 8.9k

AGAT and GFF3sort can sort GFF/GTF files to get proper tabix compliant sorting. More information here

ADD COMMENT

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6