Hi,
I have a list of TFs and marker genes for each cluster of my cells.
I want to have a data frame like below, for example if a TF exists in my list, in TF column 1 places infant of that gene means that is a TF. Or, if a gene is in my markers 1 be places infant of that if not 0 be added. But really seems to me too difficult to do
supposing I just have 14000 genes and 3 vectors, how I can make a data frame from the scratch like below? A 3 columns data frame; 14000 genes in rows , TF column, marker_1 column and marker_2 column filled by 1 or 0 depends if each of 14000 genes exist in these vectors or not
> head(genes)
TF use_as_marker_1 use_as_marker_2
ENSMUSG00000044719 0 1 0
ENSMUSG00000044591 0 0 0
ENSMUSG00000044712 0 0 0
ENSMUSG00000044734 0 0 1
ENSMUSG00000044726 0 0 0
ENSMUSG00000044724 1 0 0
Could you please help me to do that?
Please, could you provide a complete example of what you want achieve. What do you have in your lists ? Create a small example by hand to describe your problem and your aim.
Seems like you will have to put your 2 lists in 2 vectors then iterate over your dataframe, compare your current TF to your TF vector and compare your current gene to your gene vector and change the TF value if needed.
It is not completely clear to me how the data you have looks like, and an example often helps.
Thanks a lot, let's say I have a vector of 150 TFs, a vector of 500 marker genes for cluster 1 and a vector of 350 markers genes for cluster. My genome has 14000 validated gene ids. I need a data frame like above in which for TF column my 14000 genes being compared with vector of TFs, 14000 genes being compared with marker genes of cluster 1 and also cluster 2 so that if one of 14000 genes is common with genes in my three vectors I have
1
if not not I have0
. For example in above data frame ENSMUSG00000044724 is a TF but not marker of clusters, ENSMUSG00000044719 is a marker of cluster 1 and ENSMUSG00000044734 a marker of cluster 2. Unfortunately I am not able to do iteration or anything complex in R without your helpOK, so in fine you have 3 lists (TF, marker1 and marker2). And for each line of your dataframe
if
TF_SOURCE
exists in TF list, set upTF
column to 1If the
index
(gene name?) exists in marker1 list, set upuse_as_marker_1
column to 1Same as previous item for marker2
Correct ?
if my TF column exists in TF list
if your TF does not exist in TF list you want to test maker1 and marker2 or to skip them ?
Actually there is not any defined TF in data frame, please ignore TF_SOURCE column