is there any online tool to know which transcription factors ( TFs) bind to the regulatory regions of a list of genes in zebrafish?
4
0
Entering edit mode
7.1 years ago

I have a list of 1900 genes of zebrafish. i would like to find out the which TFs bind to my list of genes of zebrafish. please send links

RNA-Seq genome next-gen gene blast • 3.8k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
GenoMax 147k

Try Contra v3 server in "exploration" mode.

ADD COMMENT
0
Entering edit mode

I have t search for every single gene, so it is not possible for my 1900 genes list

ADD REPLY
1
Entering edit mode
7.1 years ago

I am assuming you have tools like wget and grep etc. and that you have installed BEDOPS. I am also assuming you are working with the zebrafish build danRer7, and that you have samtools-indexed FASTA files for this genome downloaded locally.

To build a BED file of intervals for your genes:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/danRer7/database/refGene.txt.gz | gunzip -c > danRer7.refGene.txt
$ grep -f genes.txt danRer7.refGene.txt | awk -v OFS="\t" '{ print $3,$5,$6,$13,$9,$4 }' | sort-bed - > danRer7.genes.bed

To generate TSS regions, separate by strand and get the start position of the gene:

$ awk -v OFS="\t" '($6=="+"){ print $1, $2, ($2+1), $4, $5, $6; }' danRer7.genes.bed > danRer7.genes.start.for.bed
$ awk -v OFS="\t" '($6=="-"){ print $1, $3, ($3+1), $4, $5, $6; }' danRer7.genes.bed > danRer7.genes.start.rev.bed

Then apply a 5kb asymmetric padding around the start positions (you can pick a different window to define the regulatory region upstream of the TSS):

$ bedops --range -5000:0 --everything danRer7.genes.start.for.bed > danRer7.genes.start.for.TSS5k.bed
$ bedops --range 0:5000 --everything danRer7.genes.start.rev.bed > danRer7.genes.start.rev.TSS5k.bed

Take the union of the two files:

$ bedops --everything danRer7.genes.start.for.TSS5k.bed danRer7.genes.start.rev.TSS5k.bed > danRer7.genes.TSS5k.bed

Separate out the padded TSSs per gene:

$ awk '{ print $0 >> "danRer7.TSS5k."$4".bed"; }' danRer7.genes.TSS5k.bed

Generate FASTA for each gene using the bed2faidx script. Here's how you would do this for one gene, say "LOC567192":

$ bed2faidx --fastaDir=/path/to/danRer7/indexed/fasta < danRer7.TSS5k.LOC567192.bed > danRer7.TSS5k.LOC567192.fa

You have ~1900 genes in genes.txt, so it is probably easier to use a loop:

$ while IFS= read -r gene; do echo "$gene"; bed2faidx --fastaDir=/path/to/danRer7/indexed/fasta < "danRer7.TSS5k.${gene}.bed" > "danRer7.TSS5k.${gene}.fa"; done < genes.txt

Once you have FASTA for each gene's padded TSSs, you can run MEME on the sequence information to get motif calls. You would then run TOMTOM to match what MEME discovers against entries in databases of published TF motifs.

The web interface for MEME and TOMTOM will only let you do one set of regions at a time. I would suggest you take one gene's sequence information and run it through these two tools to see what you get back, to get familiar with what these tools do.

Since you need to automate this, my advice is, once you know how MEME and TOMTOM work, you then download the source code and databases and build command-line versions of these tools that you can run locally. Then you know exactly what parameters go into your search. This is going beyond the scope of the answer here, but you can search Biostars and there should be answers that have already covered this topic.

ADD COMMENT
0
Entering edit mode

Nice comprehensive post!

ADD REPLY
0
Entering edit mode
7.1 years ago

1) Use BEDOPS bedops to generate a BED file of regulatory regions of genes (like a 5kb region upstream of TSS, say).

2) Convert BED to FASTA with samtools and bed2faidx for your indexed reference genome.

3) Run the FASTA through MEME to find putative motifs.

4) Run the motif PWMs through TOMTOM to find highest ranking matches with known/published TF motifs (e.g., matches with JASPAR, TRANSFAC, UniPROT, etc.).

Search engines on the relevant terms will get you links to the tools mentioned here.

ADD COMMENT
0
Entering edit mode

Could you send some tutorial to get bed file of regulatory genes. Can this method give regulatory sequence bed file for my list of genes at a time. Do I need to work out for each gene at a time. Please, guide me how to find regulatory region of gene with bedops because i am not familiar to this.

ADD REPLY
0
Entering edit mode

What do you have for genes? Do you just have gene names, or annotations with intervals?

ADD REPLY
0
Entering edit mode

I have only Gene names only

ADD REPLY
0
Entering edit mode

Can you post your genes in a file somewhere public?

ADD REPLY
0
Entering edit mode

I will post. Where do you want me to post.

ADD REPLY
0
Entering edit mode

Not sure, up to you. Dropbox, Google Drive, whatever you have.

ADD REPLY
0
Entering edit mode

I don't have your email id to share through Dropbox

ADD REPLY
0
Entering edit mode

Upload to pastebin.

ADD REPLY
0
Entering edit mode

I have uploaded in pastebin

ADD REPLY
0
Entering edit mode

You need to post the link for your list on pastebin here. Otherwise how can @Alex find the data.

ADD REPLY
0
Entering edit mode

https://pastebin.com/rN7vw8n7 I have uploaded in this link

ADD REPLY
0
Entering edit mode

You don’t need my email address to put the items into a Dropbox Public folder and share them.

ADD REPLY
0
Entering edit mode
7.1 years ago
ZhouK.R. ▴ 10

You can use “Download” module of ChIPBase to get all genes regulated by transcription factors or histone modification marks in Zebrafish, and then filter the results with you gene list.

ADD COMMENT
0
Entering edit mode

link you have provided is not opening

ADD REPLY
0
Entering edit mode

The website is temporarily blocked by the firewall because of the secure policy. You can send your list of genes to my mailbox and I will do this analysis.

ADD REPLY
0
Entering edit mode

Could you send your email id

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6