list of all possible transcripts in ensembl format
1
0
Entering edit mode
4.2 years ago
anba • 0

Hi,

where can I find list of all possible transcripts (in ensembl format like ENST00000374811.8) with gene names?

RNA-Seq • 1.1k views
ADD COMMENT
0
Entering edit mode

Hi, I think you can gather the information you need from Ensembl annotations.

The following command should make a bed file with transcript information you need.

awk '$3 == "transcript"' Homo_sapiens.GRCh38.101.gtf > transcript_info.bed

Hope this helps.

KB

ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode

Biomart certainly is the better answer!

ADD REPLY
1
Entering edit mode
4.2 years ago
ATpoint 85k

In R using biomaRt:

tx2gene <- mmusculus_genes <- getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id", "hgnc_symbol"),  
                                    mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl"))

head(tx2gene)

ensembl_transcript_id   ensembl_gene_id     hgnc_symbol
ENST00000387314         ENSG00000210049     MT-TF
ENST00000389680         ENSG00000211459     MT-RNR1
ENST00000387342         ENSG00000210077     MT-TV
ENST00000387347         ENSG00000210082     MT-RNR2
ENST00000386347         ENSG00000209082     MT-TL1
ENST00000361390         ENSG00000198888     MT-ND1

For mouse the dataset option would be mmusculus_gene_ensembl, for other organisms check the biomaRt/Ensembl documentation for the correct string. If there are empty slots for some of the HGNC symbol in the table then this not a bug, not every Ensembl gene ID has a matched HGNC (or in case of mouse MGI) symbol.

ADD COMMENT

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6