Hello all,
I am working with Transfac data. I tried to parse the data and assign each entry to ensembl ID. From the Transfac Pro database, there are ~400 factors that I can not assign programmatically. This is due to the gene name using alias or the writing is not formatted as the dictionary I used.
While I try to assign manually, I check one of the gene name is COUP. Then, I manually searching COUP gene in the GeneCards website. I found 2 type of COUP, COUP TF1 and COUP TF2. I am confused which I should use and after checking the matrix.dat manually to read it carefully, both of them appear in the BF description. After checking the definition of BT field in the Transfac, I a getting more confused. Below is the description:
BF Binding Factors list of linked entries of the Factor table (factor accession number; factor name; biological species); if a binding site for this factor was used to compile the matrix, this is indicated, otherwise the factor has been linked by its homology to the directly involved factors
Can anyone explain what does it mean?
I am getting more confused because in other matrix, the gene name (tag NA) and the BF tag can have different genes list. Below is another example for gene E47:
NA E47
BF T00204; E12; Species: human, Homo sapiens; site(s) included: yes.
BF T15605; E2A; Species: human, Homo sapiens; site(s) included: yes.
BF T00207; E47; Species: human, Homo sapiens; site(s) included: yes.
Also, maybe you can give some suggestion how to assign matrix name to its corresponding ensmbl id. Thank you.