I have a dataframe meth
which has genes (HGNC symbol) as rownames and samples as column names.
I want to find which gene in the rownames are transcription factors using biomaRt in R. This list should be returned as a vector.
Example:
> rownames(meth)
[1] "A1BG" "A1CF" "A2BP1" "A2LD1"
[5] "A2M" "A2ML1" "A4GALT" "AAAS"
If AIBG
, A2BP1
, and A2LD1
are transcription factors, return as vector:
[1] "A1BG" "A2BP1" "A2LD1"
On the biomart website, I can choose for example: Database: Ensembl Regulation 107 Dataset: Human Regulatory Features
But I want to find the TFs using R code.
My preliminary attempt did not filter for transcription factors.
# Biomart query
if(interactive()){
mart <- useEnsembl(biomart = "ensembl",
dataset = "hsapiens_gene_ensembl")
getBM(attributes = c("ensembl_gene_id", "p_value", "hgnc_symbol", "entrezgene_id"),
values = as.vector(rownames(meth)),
mart = mart)
}
Hi, you could download the list of human TFs from this website: http://humantfs.ccbr.utoronto.ca/download.php
(This TF list is part of this Cell review: https://www.sciencedirect.com/science/article/pii/S0092867418301065?via%3Dihub)
Then, you can check which of these TFs match with the rownames of your dataframe using a R function like inner_join from the dplyr package.