Question

Has anybody used TEToolkit successfully to quantify transposable elements?

2

Entering edit mode

8.9 years ago

Anna S ▴ 520

Hello,

I am trying to quantify transposable elements, but TEToolkit exits with an error, as seen below. Perhaps it does not like the transposable element GTF file that I created manually for the yeast? I know the syntax of this GTF file is correct since I have been able to build successful analogous GTF files for both the rabbit and mouse papilloma viruses which resulted in successful tophat and cufflink runs. I'm wondering about the contents of this GTF file however, that is, how should the transposon be specified? For example, in NCBI there is one full transposon available for the yeast, but the others are listed only as pieces flanking a gene. I was wondering if anyone has been able to run TEToolkit successfully who could shed some light on this question? Thanks a lot ! Anna

-bash-4.1$ ./TEtranscripts  --format BAM --mode multi -t ../../../../HuiLing_4567_030716/HLC1.trim.bam -c ../../../../HuiLing_4567_030716/HLC2.trim.bam --project TE_2v1  --GTF ../../../../ref/sacCerR64.gtf --TE ../../../../ref/sacCerR64_virusesalltransposons_only.gtf
INFO  @ Mon, 16 May 2016 15:01:03:
# ARGUMENTS LIST:
# name = TE_2v1
# treatment files = ['../../../../HuiLing_4567_030716/HLC1.trim.bam']
# control files = ['../../../../HuiLing_4567_030716/HLC2.trim.bam']
# GTF file = ../../../../ref/sacCerR64.gtf
# TE file = ../../../../ref/sacCerR64_virusesalltransposons_only.gtf
# multi-mapper mode = multi
# stranded = yes
# normalization = DESeq_default (rpm: Reads Per Million mapped; quant: Quantile normalization)
# FDR cutoff = 5.00e-02
# fold-change cutoff =  1.00
# read count cutoff = 1
# number of iteration = 10
# Alignments grouped by read ID = True


INFO  @ Mon, 16 May 2016 15:01:03: Processing GTF files ...

INFO  @ Mon, 16 May 2016 15:01:03: Building gene index .......

INFO  @ Mon, 16 May 2016 15:01:04: Done building gene index ......

INFO  @ Mon, 16 May 2016 15:01:04:
Building TE index .......

Error in building gene/TE index

transposable element • 4.4k views

ADD COMMENT • link updated 8.6 years ago by Devon Ryan 105k • written 8.9 years ago by Anna S ▴ 520

1

Entering edit mode

The github issue tracker would be a more appropriate place to ask for help.

ADD REPLY • link 8.9 years ago by Matt Shirley 10k

score 3 · Answer 1 · 2016-05-17

The documentation says it relies on "specially curated GTF files" which they provide here for a few model species. Having looked at the files I can say it may be difficult to generate this format exactly, so I would post an issue on github as previously suggested. Posting this as an answer because you won't be able to find this from UCSC or elsewhere.

As an aside, I don't fully agree with using GTF for these purposes. GTF was meant to describe coding features of genes and be more stringent than GFF, so this is a bit odd. I'm sure they are using external tools that require GTF but there are a couple of issues. Creating new attributes doesn't bother me, it's using 'exon' to describe a transposon and incorrect use of TE classification terms. I'm not trying to be critical of this specific tool but we have to be careful how we extend tools/formats for other purposes. This raises some flags for me because it breaks from the specification, and does so in a way that doesn't describe the biology. To be fair, it is quite difficult to describe transposon properties with tools/formats not intended for that use originally, so some engineering is usually necessary.

score 1 · Answer 2 · 2016-05-16

1

Entering edit mode

8.9 years ago

Anna S ▴ 520

The instructions say to use the UCSC RepeatMasker, which is not available for the yeast. Does anyone know if there is a RepeatMasker already done for the yeast and publicly available? Thanks!

ADD COMMENT • link 8.9 years ago by Anna S ▴ 520

score 0 · Answer 3 · 2016-07-12

Hi !

I am having exactly the same problem. I build the GTF of the TE for the macaque, exactly like specified in Hammel lab website. It looks like exactly the same structure than those. I used repeat masker from UCSC, and even used exactly the same syntax for the transcript_ID to give unique TE names.

TEtranscript exits with the error "Error in building gene/TE index"

I can't figure out the problem.

Did you solve this issue in the end ?

Thank you, Camille

score 0 · Answer 4 · 2016-08-23

0

Entering edit mode

8.6 years ago

nikulina ▴ 300

Hi!

Do you have 'family_id' and 'class_id' in the 9th column of your gtf file? In my case adding those 'dummy' fields resolved the issue.

ADD COMMENT • link 8.6 years ago by nikulina ▴ 300