I have a list of 14 genes in Arabidopsis thaliana which all are related to a common biological function (response to cadmium ion). I would like to know if there is any common transcription factor which regulate these 14 genes or some of them. Thanks
I have a list of 14 genes in Arabidopsis thaliana which all are related to a common biological function (response to cadmium ion). I would like to know if there is any common transcription factor which regulate these 14 genes or some of them. Thanks
One way to do this, perhaps:
1) Convert your gene annotations-of-interest to a sorted BED file via convert2bed
from BEDOPS, e.g.:
$ convert2bed --input=gff < genes.gff > genes.bed
2) Make a BED file of the regulatory or promoter regions, using the BED-formatted gene file and bedops --range -N:0
for forward-strand oriented annotations and bedops --range 0:N
for reverse-strand oriented annotations, e.g. for a 1kb region upstream of the TSS:
$ awk '$6=="+"' genes.bed > genes.for.bed
$ awk '$6=="-"' genes.bed > genes.rev.bed
$ bedops --range -1000:0 --everything genes.for.bed > promoters.for.bed
$ bedops --range 0:1000 --everything genes.rev.bed > promoters.rev.bed
$ bedops --everything promoters.*.bed > promoters.bed
Adjust this window or the set operations, depending on what parts of the genome you decide you want to call a promoter, with respect to the gene or genes of interest.
3) Use samtools
-indexed FASTA of your build of Arabidopsis with a script like bed2faidx.pl
to convert the BED-formatted promoter regions to sequences:
$ bed2faidx.pl < promoters.bed > promoters.fa
4) Use web or command-line MEME to query the promoter sequences for putative TF motifs.
5) Use command-line TOMTOM to query the MEME TF motifs for matches against known/published TFs from databases like JASPAR Plantae.
If you want a more fine-grained answer, you might split the gene annotations from step 1 into per-gene files, and run steps 2-5 on each gene separately. You can then get a pool of published TFs per-gene and determine from there where there are similarities or unique hits.
I wish I could help more, but I haven't worked with Arabidopsis data much, really. My answer's more of a generic approach. Maybe there are Arabidopsis-specific mailing lists or forums where people know about curated or pre-calculated datasets (https://www.arabidopsis.org/help/faq.jsp) — or perhaps ask the bioinformatics Stack Exchange? There might be some plant biology specialists there who can point you in a better direction.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Check at Araport.
thanks in which part of this webpage? could you please explain more clear?
I only suggested that as a free resource for Arabidopsis genomics (since TAIR requires a subscription) that you may find useful. It may or may not have the information you are looking for.