Hi Everyone,
I want to produce a matrix which it have in row.names the "gene ID" and in column "Transcription Factor ID" with in value the occurrency of the TF binding related to the nearest gene.
So I have produce a data frame with the wanted three information following these commands :
# create a dataframe with TFs binding site occurencies
geneTF.mtx.tmp = geneTF.df %>% group_by(geneTF.df$geneID, geneTF.df$TFID) %>% summarise(n = n())
# Create a data table from the previously create dataframe
geneTF.mtx.tmp2 = data.table(geneID = geneTF.mtx.tmp$geneID, TfID = geneTF.mtx.tmp$TfID, occurancy = geneTF.mtx.tmp$occurency)
I obtained this type of data frame :
> head(geneTF.mtx.tmp2)
geneID
1: CMiso1.1chr01g0058191
2: CMiso1.1chr01g0058191
3: CMiso1.1chr01g0058191
4: CMiso1.1chr01g0058191
5: CMiso1.1chr01g0058191
6: CMiso1.1chr01g0058191
TfID occurancy
1: ABF1(bZIP)/Arabidopsis-ABF1-ChIP-Seq(GSE80564)/Homer 24
2: At3g60580(C2H2)/col-At3g60580-DAP-Seq(GSE60143)/Homer 17
3: At5g04390(C2H2)/col200-At5g04390-DAP-Seq(GSE60143)/Homer 19
4: AT5G60130(ABI3VP1)/col-AT5G60130-DAP-Seq(GSE60143)/Homer 24
5: ATAF1(NAC)/col-ATAF1-DAP-Seq(GSE60143)/Homer 32
6: AtGRF6(GRF)/col-AtGRF6-DAP-Seq(GSE60143)/Homer 3
To summary, from this data I want to obtain a matrix with in row names the "geneID" column of the data frame, in column names the the "TfID" column and in value the "occurency" column.
Thanks in advance for your response
Hi bruce.moran,
Thanks a lot for your help, finally you last suggestion work well. I will take in count your advice, thanks.
Please do not add an answer unless you're answering the top level question. To provide feedback on someone else's answer, please use the options below:
The "Accept" option is available only for posts (Questions) that you created.
Try this:
But this would fail. Row names must be unique. geneID column has several repetitions. @ clementpch