Question

Prepare Gtf For Dexseq

3

Entering edit mode

12.6 years ago

camelbbs ▴ 710

Hi,

Anyone has used DEXSeq python script to generate the gtf files?

I can use dexseq_prepare_annotation.py for knowngene.gtf, but can't use it for refgene.gtf

I don't know why. The error is here:

python ~/dexseq_prepare_annotation.py refflat_hg19.gtf new_refflat_hg19.gtf

File "/che/dexseq_prepare_annotation.py", line 89, in <module>

assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"

AssertionError: <GenomicFeature: exonic_part 'CFB' at chr6_dbb_hap3: 3199308 -> 3199650 (strand '+')> starts too early

thanks a lot!

che

rna-seq • 7.5k views

ADD COMMENT • link updated 12.6 years ago by Sudeep ★ 1.7k • written 12.6 years ago by camelbbs ▴ 710

score 5 · Answer 1 · 2013-01-22

5

Entering edit mode

12.6 years ago

Sudeep ★ 1.7k

Have you seen this work-around posted in bioconductor mailing list?

ADD COMMENT • link 12.6 years ago by Sudeep ★ 1.7k

0

Entering edit mode

thanks.. since I am not familiar with python, while I replace the line# 28 in the script. I found the f.iv.chrom and f.iv.strand are not defined. So it will give an error like IndentationError: unexpected indent. Can you help that?

ADD REPLY • link 12.6 years ago by camelbbs ▴ 710

0

Entering edit mode

thanks, I think I solved it. python script start with space, not tab.

ADD REPLY • link 12.6 years ago by camelbbs ▴ 710

0

Entering edit mode

hi

i have exacltly the same problem

can you please show how what you changed in the dexseq_prepere_annotaion script you changed? with some code lines before and after

im trying to generate gff from gtf of hg19

ty

efrat

ADD REPLY • link 10.0 years ago by efratdahan21 • 0

score 1 · Answer 2 · 2013-01-21

1

Entering edit mode

12.6 years ago

Istvan Albert 103k

I have not used this tool but from the error message it appears that it is an error check to avoid overlapping exons.

The start of an exon needs to be larger than the end of the previous exon.

ADD COMMENT • link 12.6 years ago by Istvan Albert 103k

score 1 · Answer 3 · 2013-01-22

I believe you used the UCSC version of gtf, right? It seems that the author only tested this tool on ensembl version of gtf.

the problem with the GTF is that dexseq assumes that all transcripts with the same gene_id come from a single locus of transcription. For the UCSC GTF files this is not true. A locus of transcription is defined by the TSS_ID (transcript start ID)! Just modify the tss_id to gene_id and gene_id to something else in the attributes column. And try again...