Question

Finding zebrafish promoters ENSEMBL

0

Entering edit mode

7.2 years ago

Seigfried ▴ 80

Hello I have a list of ENSEMBL zebrafish genes whose promoters I want to extract and they are way too many to manually do it.

I was reading this post here How to get all start and end positions of promoters in All the human genome ? where @Emily_Ensembl showed how to extract human gene promoters.

I understand that maybe there is no data yet on this.

Is it possible to extract 2kb regions upstream of the 5' UTR (thats how i am defining the promoter region) in Ensembl for zebrafish?

ensembl promoter • 3.6k views

ADD COMMENT • link updated 7.2 years ago by Ben Moore ★ 2.4k • written 7.2 years ago by Seigfried ▴ 80

0

Entering edit mode

Promoter region is kinda arbitrarily defined somewhere between 2/300 bases to 2000 bases upstream of annotated TSS (transcription start sites) from Ensembl/Refseq. So you can extract 2KB region upstream of TSS and get promoter region. CAGE-seq is best way to define TSS, and have a look at this paper https://www.ncbi.nlm.nih.gov/pubmed/24002785 and the data.

ADD REPLY • link 7.2 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

Yes but I have to do this for 2000 genes so I cannot do it manually. Thank you for the paper.

ADD REPLY • link 7.2 years ago by Seigfried ▴ 80

0

Entering edit mode

Hi Seigfried,

You are right in that regulatory feature annotation is only available for human and mouse in Ensembl at the moment. However, regulatory features are annotated independently of genes, and are not directly linked, anyway.

If you are interested in retrieving the 2kb upstream sequence though, you have a number of options, depending on your preferences:

BioMart: http://www.ensembl.org/biomart/martview/ We tend to advise that you retrieve data for around 500 genes per query, but you could split your list of 2000 genes into 4 blocks of 500 genes.

Your gene IDs will be the 'Filters' and then you can choose the required upstream sequence length as the Attribute.

BioMart tutorial and recorded demo are in the following documentation pages: http://www.ensembl.org/info/data/biomart/index.html

Perl API: Use the slice adaptor to retrieve slices with respect to your gene IDs: https://www.ensembl.org/info/docs/api/core/core_tutorial.html#slices

then, use the expand() method to define your upstream sequence of interest: http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Slice.html#ad16f93a7bf30d48820f421012616d56b

REST API: http://rest.ensembl.org/documentation/info/sequence_id_post Use the POST endpoint with the optional expand_5prime parameter.

I hope this helps you retrieve the data you need.

Best wishes

Ben Ensembl Helpdesk

ADD REPLY • link 7.2 years ago by Ben Moore ★ 2.4k

0

Entering edit mode

Hello Ben

Thank you for your reply

i tried BioMart before but I couldn't find the way to select sequences upstream 2kb from TSS.

From your post this specific point : 'choose the required upstream sequence length as the Attribute.' Where can I find this specific option?

As a trial run I set the Coordinate attribute to Start -1 End -2000 like this http://asia.ensembl.org/biomart/martview/515dd4c62e0c878ec268ed9894ad5c16

But these are the gene coordinates with respect to the genome; not what I want. Could you please guide me

ADD REPLY • link 7.2 years ago by Seigfried ▴ 80

score 3 · Accepted Answer · 2018-03-02

Hi Seigfried,

Step 1: Choose Database - Ensembl Genes 91 Choose Dataset - Zebrafish genes (GRCz10)

Step 2: Click 'Filters' in the menu on the left hand side. Expand 'Gene' panel. Check the 'Input external references ID list' box to add the filter and add the list of gene IDs into the text box (or upload a file with the list of gene IDs). Also, be sure to select the correct format of gene IDs that you have used as the input from the drop-down list.

Step 3: Click 'Attributes' in the menu on the left hand side. Click the 'Sequences' radio-button option. Expand 'Sequences' panel. Check the 'Flank (Gene)' option. Check the 'Upstream flank' box and input the desired length into the text box.

N.B You can also add important information into the sequence header by selecting different options from the options in the 'Header Information' panel.

Step 4: Click 'Results' button in the top left--hand corner to view and download the results.