How to find out TFBS motifs within 5'-UTR sequences
2
0
Entering edit mode
3.4 years ago
isha.lily20 ▴ 10

Hello Researchers,

  1. Can any one tell me how to fetch out only 5'-UTR sequences in fasta format?
  2. and how to find out TFBS motifs within 5'-UTR ?

Species : rice transcriptome sequences

I haven't tried any thing, but thinking to use chip -seeker for finding TFBS , actually I am totally confuse bcoz I have transcriptome sequences, can I still take sequences from ensembl, biomart?

3. Is there any specific website for rice transcriptome sequences with 5'-UTR sequences?

Thank you

5prime-UTR TFBSmotifs • 1.7k views
ADD COMMENT
1
Entering edit mode
3.4 years ago
JC 13k

Hello,

1) you don't mention species, if your species is in Ensembl, you can use BioMart https://www.ensembl.org/biomart/ to export the 5' UTRs 2) again species, in general, there are tools to predict them, check http://molbiol-tools.ca/Transcriptional_factors.htm

ADD COMMENT
0
Entering edit mode

Hi JC, its rice transcriptome sequences , can i still export 5'-UTR from biomart ?

ADD REPLY
1
Entering edit mode
ADD REPLY
1
Entering edit mode
3.4 years ago

UCSC Goldenpath offers 1kb, 2kb, and 5kb upstream sequences with annotated 5'UTR for various assemblies.

For hg38, for example, via http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/:

upstream1000.fa.gz - Sequences 1000 bases upstream of annotated transcription starts of RefSeq genes with annotated 5' UTRs. This file is updated regularly. It might be slightly out of sync with the RefSeq data shown on the browser, as is it updated daily for most assemblies.

upstream2000.fa.gz - Same as upstream1000, but 2000 bases.

upstream5000.fa.gz - Same as upstream1000, but 5000 bases.

For example, to download and expand the upstream1000.fa.gz file for hg38:

$ wget -qO- "http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/upstream1000.fa.gz" | gunzip -c > upstream1000.fa

Once you have selected sequences for your assembly, and you have a transcription factor PWM database in hand (Transfac, Jaspar, etc.) you could use FIMO to search for putative binding sites within the sequences.

On the Bioinformatics SE, I posted a walkthrough the commands to use to run FIMO with a JASPAR TF database and sequences-of-interest, at a typical threshold of sensitivity:

https://bioinformatics.stackexchange.com/questions/2467/where-to-download-jaspar-tfbs-motif-bed-file/2491#2491

Another toolkit you might see mentioned is HOMER, but this is for de novo motif discovery, i.e., you are looking for new or unpublished motifs.

One difference between HOMER and FIMO is that FIMO would be used for discovery of published or known motifs, for which there are existing, experimentally validated PWM databases. The functionality of HOMER would perhaps be closer to the MEME tool, which is part of the larger toolkit that FIMO is in. Like HOMER, MEME would be used for finding novel motifs.

ADD COMMENT
0
Entering edit mode

Hi Alex Reynolds, do u kno any specific website for rice transcriptome , who offers all this u mentioned 1kb upstream sequences with annotated 5'-UTR ?

ADD REPLY
1
Entering edit mode

Not UCSC, but MSU keeps per-chromosome sequence and annotation files here for Japanese rice (O. sativa):

http://rice.plantbiology.msu.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_7.0/

The GFF3 files contain gene annotations, including regions defined as five_prime_UTR. I imagine those could be used with the sequence files to generate starting input for FIMO.

ADD REPLY

Login before adding your answer.

Traffic: 2560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6