This information is available from the UCSC mysql database (available for download here):
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'select * from knownGene as K ,kgXref as X where K.name=X.kgId limit 5\G'
*************************** 1. row ***************************
name: uc001aaa.2
chrom: chr1
strand: +
txStart: 1115
txEnd: 4121
cdsStart: 1115
cdsEnd: 1115
exonCount: 3
exonStarts: 1115,2475,3083,
exonEnds: 2090,2584,4121,
proteinID:
alignID: uc001aaa.2
kgID: uc001aaa.2
mRNA: BC032353
spID:
spDisplayID:
geneSymbol: BC032353
refseq:
protAcc:
description: Homo sapiens cDNA FLJ36366 fis, clone THYMU2007824.
*************************** 2. row ***************************
name: uc009vip.1
chrom: chr1
strand: +
txStart: 1115
txEnd: 4272
cdsStart: 1115
cdsEnd: 1115
exonCount: 2
exonStarts: 1115,2475,
exonEnds: 2090,4272,
proteinID:
alignID: uc009vip.1
kgID: uc009vip.1
mRNA: AX748260
spID:
spDisplayID:
geneSymbol: AX748260
refseq:
protAcc:
description: Homo sapiens cDNA FLJ36366 fis, clone THYMU2007824.
*************************** 3. row ***************************
name: uc009vjg.1
chrom: chr1
strand: +
txStart: 19417
txEnd: 20957
cdsStart: 19417
cdsEnd: 19417
exonCount: 3
exonStarts: 19417,20426,20838,
exonEnds: 19902,20530,20957,
proteinID:
alignID: uc009vjg.1
kgID: uc009vjg.1
mRNA: BC048429
spID:
spDisplayID:
geneSymbol: BC048429
refseq:
protAcc:
description: Homo sapiens cDNA clone IMAGE:5275617, **** WARNING: chimeric clone ****.
*************************** 4. row ***************************
name: uc001aal.1
chrom: chr1
strand: +
txStart: 58953
txEnd: 59871
cdsStart: 58953
cdsEnd: 59871
exonCount: 1
exonStarts: 58953,
exonEnds: 59871,
proteinID: Q8NH21
alignID: uc001aal.1
kgID: uc001aal.1
mRNA: NM_001005484
spID: Q8NH21
spDisplayID: OR4F5_HUMAN
geneSymbol: OR4F5
refseq: NM_001005484
protAcc: NP_001005484
description: olfactory receptor, family 4, subfamily F,
*************************** 5. row ***************************
name: uc009vjh.1
chrom: chr1
strand: +
txStart: 55424
txEnd: 59692
cdsStart: 58953
cdsEnd: 59691
exonCount: 3
exonStarts: 55424,55751,58899,
exonEnds: 55436,55834,59692,
proteinID: Q52R92
alignID: uc009vjh.1
kgID: uc009vjh.1
mRNA: AY972817
spID: Q52R92
spDisplayID: Q52R92_HUMAN
geneSymbol: OR4F5
refseq: NM_001005484
protAcc: NP_001005484
description: olfactory receptor, family 4, subfamily F,
Hmm...I tried this with some well-studied genes but am not getting any constitutive exons--I get all zeros. Can you show me a sample gene that has what you are looking for? I must be structuring the query wrong. This is an interesting question, btw.
From the biomart web site I downloaded the latest human data, getting the attributes: ensembl id, transcript id and constitutive exon. This is not computed for NCBI36 in the archive so I will use liftover to convert coordinates
Some numbers: 1071242 exons of which 73891 are constitutive that have any constitutive exon: 16643 genes and 20743 transcripts.
From the biomart web site I downloaded the latest human data, getting the attributes: ensembl id, transcript id and constitutive exon. This is not computed for NCBI36 in the archive so I will use liftover to convert coordinates Some numbers: 1071242 exons of which 73891 are constitutive. Those that have any constitutive exon: 16643 genes and 20743 transcripts.