creating a PWM from TFBS data in a GFF file
0
2
Entering edit mode
10.0 years ago
Affan ▴ 310

So I downloaded the TFBS from the Riken 4 database. My task is to extract the binding sites for the transcription factor MEF2 and then create a PWM for analysis. Since this is for my research, I'd like to make sure that what I am doing is right.

1. To extract only the MEF2 transcription factor, what column am I looking at?

Edit: The last column (column 9) seems to give me the information. For the Mef2 family of proteins, it's annotated as

TF_binding_site_cage_181208 MEF2A,C,D-173792 ;ALIAS MEF2A,MEF2C,MEF2D ;L3_ID L3_chr7_-_150385881

To extract this specific data I used the command

awk -F"\t" '$9~/MEF2/' file > output

2. Now suppose I have all the rows for the MEF2 TF. For each row, I have a start and end for the binding site. What software is usually needed to perform the alignment so that I may calculate the frequency counts.

3. Relating to number2, do I have to worry about the strand? I don't think so. 4. Is there any software/papers that talk about this from start to end?

Disclaimer: I am a math grad student doing research in bioinformatics. So while I'll be okay once I get the PWM, its the tools, software and the biological knowledge needed to get there.

ChIP-Seq PWM • 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2692 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6