Dear Biostars Friends, Hi
I want to report transcription factors (TFs) that are expressed differentially between my 2 conditions (male/female) of fish RNA-seq data.
I have blastX-ed my transcriptome assembly against zebrafish TFs dataset (e-value<1e-5). (with some help from @Devon and @Prasad)
8409 of my unique transcripts shows hit with 1459 unique zebrafsih TFs. (this 8409 is because each gene has several isoforms in Trinity assembly). Then I have extracted TMM-normalized FPKM values for each identified TF.
Then I have performed DEG analysis using DESeq2. and search for those DE transcripts that have shown hit with TF dataset
Among those 8409 transcripts (with TF hits), 69 was among female up-regulated transcripts and 19 among Male up-regulated transcripts (that DESeq2 has reported). (So, 8321 TFs are expressed at the same level in males and females)
Now I intend to show them by creating a scatter plot for example with the FPKM for each TF in males on the X-axis and the FPKM for each TF in females on the Y-axis.
in this case, TFs with the same expression in males and females will make a line with a slope of 1. Differentially expressed transcripts will deviate from that line.
and I have seen something about Fisher Exact Test in this regard, too.
My questions:
1- I have 3 numbers of FPKM for each male transcripts and 3 for females (biological replications). How I must show these 6 numbers in scatter plot ? Do I must calculate the mean of each TF (transcripts) for males and females FPKM and then use that mean number for each sex ? or use the FDR of DESeq2 DEG analysis values ?
2- Is this method of visialization good ? if not, what is your suggestions?
~ Thank you in advance
Dear Spacemorrissey, Hi and thanks for your help.
So, I must calculate the mean of my 8409 TF one time or males and one time for female and then draw the scatter plot, did I get the point correctly ?
I have tried the mean of TMM-FPKM and the plot was not what I need, I guess I must use the FDR value instead of mean of expression.
plotting FPKM in men vs women should give you what you wanted. I do not think you want to be plotting the FDR values.
My criteria for selection of DE-TF (differentially expressed transcription factor) was FDR and fold change (as I have mentioned, I have used DESeq2), now I think showing the scatter plot of log2 of TMM or FPKM or TPM in not going to show those TF that are significant in DESeq2 as significant.
What do you think ?