How Do you find constitutive exons in Ensembl or UCSC
2
0
Entering edit mode
7.2 years ago
ZheFrench ▴ 590

How do you find exons annotated as constitutive from official source Ensembl or UCSC?

I first tried BioMart using Exon Information : Constitutive Exon. But that didn't give me what I expected. All exons are most of the time annotated as 0 meaning alternative whereas the most part of them should be constitutive. Take example of scrib or cd44 genes.

Someone in biostar talk about UCSC knownAlt table...but didn't found it for hg38

Or does someone have something on a github to share to do that from a gtf file ?

exons constituve ensembl ucsc biomart • 3.6k views
ADD COMMENT
1
Entering edit mode

note to self: "constitutive exons"— exons which are consistently conserved after splicing

ADD REPLY
0
Entering edit mode

Neither scrib nor cd44 should be annotated as having constitutive exons, are you finding something that shows otherwise?

ADD REPLY
0
Entering edit mode

My PI does :) .Yeah I do understand and I'm totally agreed in fact. That's really depending on annotations you look at. In UCSC, gencode 24 is displayed for annotation and it seems exon(35/37) is alternative but all the others seems constitutive. When you look gtf imported from Gencode 25 based on Ensembl 85 http://jul2016.archive.ensembl.org/index.html, new transcripts are shown and effectively I see it changes who is alternative or constitutive. Look here : https://github.com/ZheFrenchKitchen/pics/blob/master/SCRIB.png

ADD REPLY
5
Entering edit mode
7.2 years ago
Emily 24k

Looking at your genes, there are no constituative exons. SCRIB-207 shares no exons with SCRIB-204 for example: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000180900;r=8:143790920-143815379

CD44-234 and CD44-225 share no exons either: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000026508;r=11:35138870-35232402

ADD COMMENT
1
Entering edit mode
7.2 years ago

If I'm not wrong (I havent checked...) , the following script should retrieve all the constitutive exons from UCSC/wgEncodeGencodeCompV27 (may be not the best source of transcripts)

# remove existing sqlite3 table
rm -f tmp.sqlite
# download genes, create one table exon/transcript
curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/wgEncodeGencodeCompV27.txt.gz" |\
    gunzip -c | awk -F '\t' 'BEGIN {printf("create table T1(exon TEXT,transcript text);\nBEGIN TRANSACTION;\n");}{nExons=int($9);split($10,starts,/,/);split($11,ends,/,/); for(i=1;i<=nExons;i++) printf("INSERT INTO T1(exon,transcript) VALUES(\"%s_%s_%s\",\"%s\");\n",$3,starts[i],ends[i],$2);}END{printf("COMMIT;\n");}'  | sqlite3 tmp.sqlite


# download genes into, create one table gene/transcript
curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/wgEncodeGencodeAttrsV27.txt.gz" | gunzip  -c | cut -f1,5 | sort | uniq | awk -F '\t' 'BEGIN {printf("create table T2(gene TEXT,transcript text);\nBEGIN TRANSACTION;\n");} {printf("INSERT INTO T2(gene,transcript) VALUES(\"%s\",\"%s\");\n",$1,$2);}END{printf("COMMIT;\n");}' | sqlite3 tmp.sqlite

# count number of transcripts per gene
sqlite3 -noheader tmp.sqlite 'select gene from T2;' | sort | uniq -c| awk 'BEGIN {printf("create table T3(gene TEXT,num INTEGER);\nBEGIN TRANSACTION;\n");}{printf("insert into T3(num,gene) values(%s,\"%s\");\n",$1,$2);}END{printf("COMMIT;\n");}' > tmp.sql &&  sqlite3 tmp.sqlite < tmp.sql && rm tmp.sql

# count number of transcript per exon
sqlite3 -noheader tmp.sqlite 'select exon from T1;' | sort | uniq -c| awk 'BEGIN {printf("create table T4(exon TEXT,num INTEGER);\nBEGIN TRANSACTION;\n");}{printf("insert into T4(num,exon) values(%s,\"%s\");\n",$1,$2);}END{printf("COMMIT;\n");}' > tmp.sql &&  sqlite3 tmp.sqlite < tmp.sql && rm tmp.sql

# join everything
sqlite3   tmp.sqlite 'select T1.exon,T2.gene,T4.num from T1,T2,T3,T4 where T1.transcript = T2.transcript and T2.gene = T3.gene and T4.exon=T1.exon and T4.num=T3.num; '

# cleanup
rm tmp.sqlite

output:

(...)
chr22_50605367_50605443|ENSG00000008735.13|2
chr22_50605561_50605734|ENSG00000008735.13|2
chr22_50605824_50605934|ENSG00000008735.13|2
(...)
ADD COMMENT
0
Entering edit mode

Thanks by the way for your input !

ADD REPLY
0
Entering edit mode

I tried to run this for hg19 but it does not work. Is it correct to use:

"http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeCompV19.txt.gz

instead of

"http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/wgEncodeGencodeCompV27.txt.gz

and

"http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeAttrsV19.txt.gz

instead of

"http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/wgEncodeGencodeAttrsV27.txt.gz

thanks in advance

ADD REPLY

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6