Hi, all
Recently, I am working with ensembl GTF annotation files, and try to detect the useful exon I wanted.
I am confused with ensembl Exon ID. For example, the three exons (see below) are belongs to Gene ENSG00000000003 and have the same start site and end site.
chrX protein_coding exon 99890555 99890743 . - . gene_id "ENSG00000000003"; transcript_id "ENST00000373020"; exon_number "2"; gene_name "TSPAN6"; gene_biotype "protein_coding"; transcript_name "TSPAN6-001"; exon_id "ENSE00003662440";
chrX processed_transcript exon 99890555 99890743 . - . gene_id "ENSG00000000003"; transcript_id "ENST00000496771"; exon_number "2"; gene_name "TSPAN6"; gene_biotype "protein_coding"; transcript_name "TSPAN6-003"; exon_id "ENSE00003512331";
chrX processed_transcript exon 99890555 99890743 . - . gene_id "ENSG00000000003"; transcript_id "ENST00000494424"; exon_number "3"; gene_name "TSPAN6"; gene_biotype "protein_coding"; transcript_name "TSPAN6-002"; exon_id "ENSE00003512331";
My questions:
- Why the first exon (ENSE00003662440) and last two exons (ENSE00003512331) are annotated with different Exon ID?
- Could anybody explain the method of Exon ID annotation? (I don't find any document on ensembl site about the Exon annotation)
Thanks
I'm also a bit confused about this GTF files. What does it mean "exon version"? What means the first "1" on every column?
Thanks in advance!
first "1" on every column is name of the chromosome ( chromosome 1 ). exon_version: The stable identifier version for this exon.
you can find gtf format detail from: ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/README