Disclaimer: Tried to post this on bioconductor support but it wont allow me. I tried adding an entire paragraph in "English language" but no - still wouldn't allow me.
Hi everyone,
I am using DEXSeq for exon quantification. I ran dexseq_prepare_annotation to convert gencode v24 GTF to GFF like this:
python2.7 ~/path/to/R/library/DEXSeq/python_scripts/dexseq_prepare_annotation.py gencode.v23.annotation.gtf gencode.v23.annotation.gff
For IDO2 which has gene id ENSG00000188676, I got 18 exonic parts in the GFF:
grep 'ENSG00000188676' gencode.v23.annotation.gff
chr8 dexseq_prepare_annotation.py aggregate_gene 39934614 40016391 . + . gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39934614 39934954 . + . transcripts "ENST00000343295.8"; exonic_part_number "001"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39934955 39935218 . + . transcripts "ENST00000343295.8+ENST00000502986.2"; exonic_part_number "002"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39949149 39949165 . + . transcripts "ENST00000343295.8+ENST00000502986.2"; exonic_part_number "003"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39949166 39949264 . + . transcripts "ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "004"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39963608 39963703 . + . transcripts "ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "005"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39979067 39979186 . + . transcripts "ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "006"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39982652 39982770 . + . transcripts "ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "007"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39984951 39985507 . + . transcripts "ENST00000343295.8"; exonic_part_number "008"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39985508 39985522 . + . transcripts "ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "009"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39985523 39986460 . + . transcripts "ENST00000343295.8"; exonic_part_number "010"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39986900 39987085 . + . transcripts "ENST00000343295.8"; exonic_part_number "011"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39987743 39987870 . + . transcripts "ENST00000418094.1"; exonic_part_number "012"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39987871 39987970 . + . transcripts "ENST00000418094.1+ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "013"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 39989721 39989838 . + . transcripts "ENST00000418094.1+ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "014"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 40005327 40005378 . + . transcripts "ENST00000389060.8+ENST00000502986.2"; exonic_part_number "015"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 40013565 40013713 . + . transcripts "ENST00000418094.1+ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "016"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 40015247 40015605 . + . transcripts "ENST00000418094.1+ENST00000343295.8+ENST00000389060.8+ENST00000502986.2"; exonic_part_number "017"; gene_id "ENSG00000188676.13"
chr8 dexseq_prepare_annotation.py exonic_part 40015606 40016391 . + . transcripts "ENST00000418094.1+ENST00000343295.8+ENST00000502986.2"; exonic_part_number "018"; gene_id "ENSG00000188676.13"
However, when I go to UCSC genome browser, it shows that IDO2 has 10 exons. Why does my GFF show 18 exonic parts or is there an issue with the conversion?
Thank you very much for the explanation!!