Question

How do you find transcription factors in a newly sequenced genome?

0

Entering edit mode

2.1 years ago

DNAngel ▴ 250

Hello all,

My lab has sequenced the genome of a bee and my goal is to obtain all the transcription factors from the genome. This is not Chip-seq data so I don't have information on peaks and was wondering how do I go about this process? Sorry if this is a silly question but it seems all tutorials I come across for finding TF motifs use chip-seq peaks but that is not something we have.

I have the assembled genome and it has a separate annotation file of gene IDs. I think I need to just scan the genome for DNA-binding domains upstream from genes but I really am unclear on how to do this. I did come across this paper that says to find DBDs first and then use InterProScan...but their links are broken. Since I am working with a bee I would likely need to use DBDs from other invertebrates that are already found - is this right? Can anyone please provide me with a database link that does work that has DBDs for invertebrates?

Is my thinking correct also? Can I use InterProScan on an assembled genome to obtain a list of enriched TFs?

interproscan factors transcription • 1.0k views

ADD COMMENT • link updated 2.1 years ago by xmLiu ▴ 20 • written 2.1 years ago by DNAngel ▴ 250

score 2 · Answer 1 · 2022-10-19

2

Entering edit mode

2.1 years ago

Mensur Dlakic ★ 28k

You seem to be mixing transcription factors (TFs) which are proteins, and their binding motifs, which are DNA sequences. Additionally, not all DBDs are TFs, while all TFs are DBDs. What is it exactly that you want to do?

If you want to find proteins, that's done like with any other protein family. Simply annotate your protein sequences against a database of known protein families. Interpro, CDD, Smart would all be good choices. If you want to find DNA motifs, that is more complicated, but it still starts with finding DNA-binding proteins.

ADD COMMENT • link 2.1 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

I would be looking for TF binding sites then because I will be using the program metalysis. From what I understand, I will have obtained binding motifs and their "scores" which are then used in metalysis, and this will use the scores and its P-values to look for for associations between the cis-regulatory elements (is this another term for binding motifs?) and the expression pattern of the DEGs that they are upstream from.

But I have no clue how to even begin obtaining TF motifs (I've seen examples online and it should look like letter graphs) that also produce a "score". What the "score" is I am unsure as this isn't explained in the papers I'm reading.

ADD REPLY • link 2.1 years ago by DNAngel ▴ 250

2

Entering edit mode

You can download the matrix of TF motifs from the JASPAR website (https://jaspar.genereg.net/).

ADD REPLY • link 2.1 years ago by xmLiu ▴ 20