Prepare Gtf For Dexseq
3
3
Entering edit mode
11.9 years ago
camelbbs ▴ 710

Hi,

Anyone has used DEXSeq python script to generate the gtf files?

I can use dexseq_prepare_annotation.py for knowngene.gtf, but can't use it for refgene.gtf

I don't know why. The error is here:

python ~/dexseq_prepare_annotation.py refflat_hg19.gtf new_refflat_hg19.gtf

File "/che/dexseq_prepare_annotation.py", line 89, in <module>

assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"

AssertionError: <GenomicFeature: exonic_part 'CFB' at chr6_dbb_hap3: 3199308 -> 3199650 (strand '+')> starts too early

thanks a lot!

che

rna-seq • 7.2k views
ADD COMMENT
5
Entering edit mode
11.9 years ago
Sudeep ★ 1.7k

Have you seen this work-around posted in bioconductor mailing list?

ADD COMMENT
0
Entering edit mode

thanks.. since I am not familiar with python, while I replace the line# 28 in the script. I found the f.iv.chrom and f.iv.strand are not defined. So it will give an error like IndentationError: unexpected indent. Can you help that?

ADD REPLY
0
Entering edit mode

thanks, I think I solved it. python script start with space, not tab.

ADD REPLY
0
Entering edit mode

hi

i have exacltly the same problem

can you please show how what you changed in the dexseq_prepere_annotaion script you changed? with some code lines before and after

im trying to generate gff from gtf of hg19

ty

efrat

ADD REPLY
1
Entering edit mode
11.9 years ago

I have not used this tool but from the error message it appears that it is an error check to avoid overlapping exons.

The start of an exon needs to be larger than the end of the previous exon.

ADD COMMENT
1
Entering edit mode
11.9 years ago
Ge ▴ 80

I believe you used the UCSC version of gtf, right? It seems that the author only tested this tool on ensembl version of gtf.

the problem with the GTF is that dexseq assumes that all transcripts with the same gene_id come from a single locus of transcription. For the UCSC GTF files this is not true. A locus of transcription is defined by the TSS_ID (transcript start ID)! Just modify the tss_id to gene_id and gene_id to something else in the attributes column. And try again...

ADD COMMENT

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6