UCSC track hub: how to best include a GTF / GFF file
1
2
Entering edit mode
4.5 years ago
nlehmann ▴ 150

Hello,

I have set a UCSC trackhub, which includes annotations that were originally in the GTF or GFF formats. So I followed the steps described here: https://genome.ucsc.edu/goldenPath/help/bigGenePred.html (example 4: GTF (or GFF) to BigGenePred) to obtain a binary bigGenePred for each GTF / GFF.

It works fine, except that when visualizing the result, I loose data on introns and strand orientation. It means that for each gene annotated, I get a solid line, going from the 5' end to the 3'end of the gene.

One example of what I get (the first 3 light blue top lines are from one of my GTF annotation turned to binary bigGenePred):

Track hub screen capture

I really want to see a finer granularity where you can actually see exons / introns and strand orientation, so I tried to upload one of my GTF to a custom track. This way works: I do not lose data from the annotation and get to see all the details. However, I could not find a way to integrate a custom track to a track hub.

Result with the custom track (top 3 black lines are from one of my GTF annotation): Custom track screen capture

So this my question: what is the best way to integrate a GTF file to a trackhub and not lose any detail ?

Thanks for the help.

UCSC trackhub annotation • 2.9k views
ADD COMMENT
1
Entering edit mode

Hello,

bigGenePred does support intron/strand information. It should not be a problem when converting from a GTF/GFF. Initial suspicions are the problem may be the way the trackDb stanza is declared in the hub. Are you stating type bigGenePred, etc? Another possibility could be the input GTF file or the conversion (stating the correct bidBed 12+8), but that's less likely. Essentially from your first screenshot it looks like the Genome Browser is displaying your file as a BED3.

If you would like to email us a copy of the GTF file as well as a link to the hub to our private mailing list (genome-www@soe.ucsc.edu) we could take a look. Note that only internal Genome Browser staff can see the contents of the message.

ADD REPLY
1
Entering edit mode

Thanks for your support, it's solved ! I stated "bigBed" instead of "bigGenePred" in the trackDB. I got confused by the last command "bedToBigBed" in the conversion from GTF to bigGenePred.

ADD REPLY
0
Entering edit mode

You cannot use directly the GFF or GTF files into the trackhub? Some important features was probably missing in the GFF/GTF files you converted. I suggest you standardise them with agat_sp_gxf2gxf.pl from AGAT and re-try the conversion. If still does not work you could add introns features too using agat_sp_add_introns.pl.

ADD REPLY
0
Entering edit mode

You cannot use directly the GFF or GTF files into the trackhub?

Nop, doesn't seem to. UCSC manual says:

Custom tracks can be constructed from a wide range of data types; hub tracks are limited to compressed binary indexed formats that can be remotely hosted. However, the custom tracks utility does not offer the data persistence and track configurability provided by the track hub.

By compressed binary they mean one of these (from this part of UCSC manual):

  • bam/cram: Compressed Sequence Alignment/Map tracks
  • bigBed: Item or region tracks
  • bigBarChart: Bar charts of categorical variables displayed over genomic regions
  • bigChain: Genome-wide Pairwise Alignments
  • bigGenePred: Gene Annotations
  • bigInteract: Pairwise interactions
  • bigNarrowPeak: Peaks
  • bigMaf: Mulitple Alignments
  • bigPsl: Pairwise Alignments
  • bigWig: Signal graphing tracks
  • hic: Hi-C contact matrices
  • halSnake: HAL Snake Format
  • vcfTabix: Variant Call Format

Some important features was probably missing in the GFF/GTF

I have exactly the same data input in both cases (trackhub and custom track), so no data missing in the GTF/GFF. The problem is the conversion to bigGenePred that produce a loss of data (same as converting from GTF to BED file).

If still does not work you could add introns features too using agat_sp_add_introns.pl.

Thanks for the help and suggestions ! I will try it if I cannot find a more standard solution. There are lots of detailed annotations in UCSC so I guess there must be some UCSC-internal way of doing.

ADD REPLY
2
Entering edit mode
4.5 years ago
Luis Nassar ▴ 670

Glad it was solved! I'm just going to copy the response as an answer so the post doesn't show up unanswered:

Hello,

bigGenePred does support intron/strand information. It should not be a problem when converting from a GTF/GFF. Initial suspicions are the problem may be the way the trackDb stanza is declared in the hub. Are you stating type bigGenePred, etc? Another possibility could be the input GTF file or the conversion (stating the correct bidBed 12+8), but that's less likely. Essentially from your first screenshot it looks like the Genome Browser is displaying your file as a BED3.

If you would like to email us a copy of the GTF file as well as a link to the hub to our private mailing list (genome-www@soe.ucsc.edu) we could take a look. Note that only internal Genome Browser staff can see the contents of the message.

ADD COMMENT

Login before adding your answer.

Traffic: 1881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6