Question

creating a PWM from TFBS data in a GFF file

2

Entering edit mode

10.0 years ago

Affan ▴ 310

So I downloaded the TFBS from the Riken 4 database. My task is to extract the binding sites for the transcription factor MEF2 and then create a PWM for analysis. Since this is for my research, I'd like to make sure that what I am doing is right.

1. To extract only the MEF2 transcription factor, what column am I looking at?

Edit: The last column (column 9) seems to give me the information. For the Mef2 family of proteins, it's annotated as

TF_binding_site_cage_181208 MEF2A,C,D-173792 ;ALIAS MEF2A,MEF2C,MEF2D ;L3_ID L3_chr7_-_150385881

To extract this specific data I used the command

awk -F"\t" '$9~/MEF2/' file > output

2. Now suppose I have all the rows for the MEF2 TF. For each row, I have a start and end for the binding site. What software is usually needed to perform the alignment so that I may calculate the frequency counts.

3. Relating to number2, do I have to worry about the strand? I don't think so. 4. Is there any software/papers that talk about this from start to end?

Disclaimer: I am a math grad student doing research in bioinformatics. So while I'll be okay once I get the PWM, its the tools, software and the biological knowledge needed to get there.

ChIP-Seq PWM • 2.0k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Affan ▴ 310