Hello! I need to calculate and plot some data in R, but I am having some trouble doing this so any help will be really appreciated. I have two tables in R which looks like this:
Tumor_Sample_Barcode ERBB3 ERBB4 ERCC2 ESR1
10010212 0 0 0 0
10010215 0 0 1 0
10010219 0 0 0 0
10010223 0 0 1 0
10010228 0 0 0 0
10010238 0 0 0 0
10010244 0 0 0 0
10010249 0 0 0 0
One has information about somatic variants found in different tumour samples while the other has information about the germline variants (both have the same "Tumor Sample ID"). A number 1 means that a variant has been found for that gene while a 0 means that there hasn't.
Firstly, I had to calculate the correlation between them (Germline x germline and somatic x somatic) and had no problem with that, but now I need to calculate the correlation between germline and somatic variant and I had no idea how to do that.
Secondly, I noticed that there are a lot of genes that have 0 correlation with any other gene (except themselves) and I would like to delete them from the table, in order to make the data cleaner, how can I achieve this?
This is the piece of code I am using to calculate and plot my correlations:
g_spearmancorr <- cor(genes_gvcount, use = "complete.obs", method = "spearman")
g_spearmancorr[is.na(g_spearmancorr)] = 0
pheatmap(g_spearmancorr, cluster_rows = F, cluster_cols = F,
fontsize_col = 5, fontsize_row = 3)
So sorry if my questions is too basic but I am pretty novice using R and programming in general. Thanks in advance!