Inconsistent display of GFF files in IGB
2
2
Entering edit mode
9.2 years ago
Chris Cole ▴ 800

I'm finding that IGB is displaying GFF files differently via Quickload rather than 'Open file...' for exactly the same file, see below for example.

In black is the Quickload version and in blue is the same file loaded via the File menu. As you can see the QL version is only reporting single long version with no introns.

Quickload vs file

In fact neither is really what should be shown. There is only one gene model in this locus so there should only be one example, but IGB is displaying the 'mRNA', 'exon' and 'CDS' entries in the GFF file as separate tracks rather than just one.

The data is valid gff3 format, as far as I can tell, but if I change the file extension to .gff3 (from .gff) I get the brown track. Yuk!

I really want to be able to use IGB for generating paper figures, but at the moment it is not possible.

Help, please!

quickload igb gff • 2.6k views
ADD COMMENT
1
Entering edit mode
one quick fix would be to grep your gff file for a single feature class (mRNA for example)
ADD REPLY
1
Entering edit mode

That won't work for 'mRNA' as I want to retain the introns. Using 'exon' does work, thanks. It doesn't feel right however as the GFF file should work.

ADD REPLY
0
Entering edit mode

yes exons would be better. and of course the genome browser could take care of it...

ADD REPLY
1
Entering edit mode
9.2 years ago
dcnorris ▴ 270

Hi Chris,

I have spent some time looking at your file and at the IGB parser. The problem in generating the visualization comes from the duplicate IDs used in the file. IGB is not handling this case. On reviewing the specification, I noticed http://www.sequenceontology.org/gff3.shtml now describes a scenario when it is valid to include duplicate IDs (I'm not sure the specification always included this allowance see http://gmod.org/wiki/GFF3#GFF3_Format and note the difference).

The specific use of duplicate IDs in the case of your file is a confusing since the CDS features and exon features do not seem to me to represent a single "discontinuous feature"; however, I think its possible to allow for this sort of interpretation in the parser and produce the correct visualization.

For now, you can visualize your data without issues by removing the duplicate IDs.

i.e. https://dl.dropboxusercontent.com/u/18695961/gene.gff3

##gff-version 3.2.1
##sequence-region    GAOABQK02G7M1S 1 774046
GAOABQK02G7M1S    .    gene    768466    771063    1    +    .    ID=g3866;Name=g3866
GAOABQK02G7M1S    .    mRNA    768466    771063    1    +    .    ID=g3866.t1;Parent=g3866
GAOABQK02G7M1S    .    CDS    768466    768553    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    CDS    768640    768852    1    +    2    Parent=g3866.t1
GAOABQK02G7M1S    .    CDS    768917    768957    1    +    2    Parent=g3866.t1
GAOABQK02G7M1S    .    CDS    769031    769132    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    CDS    769212    769257    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    CDS    769387    769793    1    +    2    Parent=g3866.t1
GAOABQK02G7M1S    .    CDS    769891    771060    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    768466    768553    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    768640    768852    1    +    2    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    768917    768957    1    +    2    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    769031    769132    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    769212    769257    1    +    0    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    769387    769793    1    +    2    Parent=g3866.t1
GAOABQK02G7M1S    .    exon    769891    771060    1    +    0    Parent=g3866.t1
ADD COMMENT
1
Entering edit mode

This is the visualization after the changes mentioned above. I have not noted any differences when loading from quickload, so if you have any additional information to help us reproduce the issue please let me know.

ADD REPLY
0
Entering edit mode
9.2 years ago
Ann ★ 2.4k

Great news on the paper! Bad news on the problem you have having with IGB.

This sounds suspiciously like a known bug with a related format - GTF. See https://jira.transvar.org/browse/IGBF-809

Does this sound similar to what you're experiencing?

A couple questions:

  • Is the file on your Quickload site tabix-indexed?
  • Could you send us a copy? This would be very helpful with debugging. Email to aloraine@uncc.edu is fine. Or Dropbox works too.

For now, as a workaround, you might try expanding the display region. Zoom out to a bigger region and then click "Load Data"

I think the problem could be related to how loading by region works. When you select "Load Data", IGB retrieves all items from your larger file that map to your requested region. If parts of your gene model lie outside that region, then they will not be retrieved and your gene models, once they appear in IGB, will appear incomplete. Or otherwise look wrong, as in the image you showed above.

A potential fix would be for IGB to expand the load region behind the scenes to retrieve any bits and pieces that extended outside the request region. The data in the requested region could be loaded and then we could go back for more, looking for any bits and pieces that were missing.

Good luck with the paper and I hope we will have some good news for you soon!

-Ann

ADD COMMENT
0
Entering edit mode

Hi Ann, thanks for the reply.

I'm afraid it's not the same issue. The level of zoom makes no difference. Also, why would zoom affect the same data, but with different file extensions?

The data is not tabix-indexed. See link for the data used in the above example.

ADD REPLY

Login before adding your answer.

Traffic: 1485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6