I am working with TCGA data and I would like to divide cases into different groups with different activity levels of given transcription factors. Then I would like to perform for e.g. differrential expression analysis. Recently I learned about ISMARA web-tool which estimates TF activity using the expression levels of it's targer genes but it requires raw sequencing data which are not freely available in TCGA database. I think that it can be done using counts data but I do not know any tools for this. Can you tell me what can I use? Any R packages?
Ewentually, what can I use to find target genes of transcription factors?
Gene expression is normalized, and lost many information from raw data. You can't get peak of the read from gene expression value. So I think raw fastq, bam, or sam data is necessary for ISMARA.
There are many databases contain the target genes of transcription factor, TRANSFAC, JASPAR, and HOCOMOCO.
Why is information lost after normalization? Most tools calculate a single scaling factor so data distributon and size relationships between genes remain untouched. Also, I do not think that any of the databases you list contain target genes, they contain curated DNA motifs, nothing more.
After normalization, we have a expression value named FPKM, RPKM, or RSEM. But we lost the information about which part of the gene have more reads, or which part of the gene have less reads. But peak is very important in transcription factor banding analysis.
These databases are well-known TF target databases, I have used them before.
You can use ARACNe and VIPER to estimate TF activity from expression data. The former builds a network of regulated genes and it was already run on most TCGA data. VIPER itself is accessible in a R package.