Hello all,
My lab has sequenced the genome of a bee and my goal is to obtain all the transcription factors from the genome. This is not Chip-seq data so I don't have information on peaks and was wondering how do I go about this process? Sorry if this is a silly question but it seems all tutorials I come across for finding TF motifs use chip-seq peaks but that is not something we have.
I have the assembled genome and it has a separate annotation file of gene IDs. I think I need to just scan the genome for DNA-binding domains upstream from genes but I really am unclear on how to do this. I did come across this paper that says to find DBDs first and then use InterProScan...but their links are broken. Since I am working with a bee I would likely need to use DBDs from other invertebrates that are already found - is this right? Can anyone please provide me with a database link that does work that has DBDs for invertebrates?
Is my thinking correct also? Can I use InterProScan on an assembled genome to obtain a list of enriched TFs?
I would be looking for TF binding sites then because I will be using the program metalysis. From what I understand, I will have obtained binding motifs and their "scores" which are then used in metalysis, and this will use the scores and its P-values to look for for associations between the cis-regulatory elements (is this another term for binding motifs?) and the expression pattern of the DEGs that they are upstream from.
But I have no clue how to even begin obtaining TF motifs (I've seen examples online and it should look like letter graphs) that also produce a "score". What the "score" is I am unsure as this isn't explained in the papers I'm reading.
You can download the matrix of TF motifs from the JASPAR website (https://jaspar.genereg.net/).