Hello!
I would ask you some questions.
Basically I have a list of genes that are co-expressed during a process of differentiation. This is based upon time-course RNA-seq data of which I have performed hierarchical clustering on a set of genes related to different GO terms.
Basically, I pasted the list into Cytoscape and utilized iRegulone for getting a list of putative TFs that may control the expression of these genes. The results I got seem pretty interesting. However, I would like to try other tools / predictive approaches I could perform on them. Have you any suggestions?
Regarding this point, I was considering to use some tools of the MEME suite for identifying common motifs in a set of "promoter regions" * defined for all genes of my list. So, basically the question is: how can I retrieve the sequences at fixed length from TSS of 60/100 genes at the same time? I was used to do that on EPD but I can't do that systematically.
I use R as main programming language
Thank you!
could help if you tell which organism, or if you have the genome in Fasta and annotation (GFF3 or similar) it's trivial to write a Perl/Python script to retrieve all promoters.
Hello! I am working on human lines. Are there some examples that I could use as a reference?
Get a GTF file, extract the transcription start sites and then define a window of e.g. -500bp as a proxy for the promoter. No magic here.
use BioMart https://www.ensembl.org/biomart/martview