Question

How to Visulaise Genes per sample (Text visualisation)

0

Entering edit mode

7.2 years ago

Kritika ▴ 270

Hi I have character matrix . I want to visualize all gene per sample . That means my all genes will be visible with there sample name Is there any approach to do this

MED20_Average   Glycyrrhizic_acid_rep_1

YIPF2_Average   Glycyrrhizic_acid_rep_1

PISD    Glycyrrhizic_acid_rep_1

AURKAIP1    Glycyrrhizic_acid_rep_1

BCL7C   Glycyrrhizic_acid_rep_1

PTCRA_Average   Hydroxysafflor_yellow_A

VPS53_Average   Hydroxysafflor_yellow_A

PTPN9   Hydroxysafflor_yellow_A

PHC3_Average    Anhydroicaritin

SCCPDH_Average  Anhydroicaritin

SOCS2_Average   Anhydroicaritin

SP2_Average Anhydroicaritin

LMTK2   Anhydroicaritin

TIMM10B Anhydroicaritin

GEMIN8  Anhydroicaritin

ABHD17B Anhydroicaritin

ANKMY1_Average  Hyperoside

F11R_Average    Hyperoside

Text Visualisation • 1.8k views

ADD COMMENT • link updated 7.2 years ago by steve ★ 3.5k • written 7.2 years ago by Kritika ▴ 270

0

Entering edit mode

It is not clear exactly what you want to do? It appears that your list is already sorted on the second column. Where are the sample names?

ADD REPLY • link 7.2 years ago by GenoMax 149k

score 2 · Answer 1 · 2017-12-27

If I am allowed to post two answers, I just figured this one out and like it better

#!/usr/bin/env Rscript

# visualize a character matrix

# install.packages("DiagrammeR")
library("DiagrammeR")


data <- structure(list(gene = structure(c(8L, 18L, 10L, 3L, 4L, 11L, 
17L, 12L, 9L, 13L, 14L, 15L, 7L, 16L, 6L, 1L, 2L, 5L), 
.Label = c("ABHD17B", "ANKMY1_Average", "AURKAIP1", 
"BCL7C", "F11R_Average", "GEMIN8", "LMTK2", 
"MED20_Average", "PHC3_Average", "PISD", 
"PTCRA_Average", "PTPN9", "SCCPDH_Average", 
"SOCS2_Average", "SP2_Average", "TIMM10B", 
"VPS53_Average", "YIPF2_Average"), 
class = "factor"), 
sample = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L,4L),
.Label = c("Anhydroicaritin", "Glycyrrhizic_acid_rep_1", 
"Hydroxysafflor_yellow_A", "Hyperoside"), 
class = "factor")), 
.Names = c("gene","sample"),
class = "data.frame", row.names = c(NA, -18L))

head(data)
# gene                  sample
# 1 MED20_Average Glycyrrhizic_acid_rep_1
# 2 YIPF2_Average Glycyrrhizic_acid_rep_1
# 3          PISD Glycyrrhizic_acid_rep_1
# 4      AURKAIP1 Glycyrrhizic_acid_rep_1
# 5         BCL7C Glycyrrhizic_acid_rep_1
# 6 PTCRA_Average Hydroxysafflor_yellow_A


uniquenodes <- unique(c(as.character(data[["gene"]]), as.character(data[["sample"]])))
nodes <- create_node_df(n = length(uniquenodes), 
                        type = "number", 
                        label = uniquenodes)
edges <- create_edge_df(from = match(as.character(data[["sample"]]), uniquenodes), 
                        to = match(as.character(data[["gene"]]), uniquenodes), 
                        rel = "related")
g <- create_graph(nodes_df=nodes, 
                  edges_df=edges)
render_graph(g)
# devtools::install_github('rich-iannone/DiagrammeRsvg')
# install.packages("rsvg")
export_graph(g, "genes_diagram.png")

Output:

genes_diagram

References

http://rich-iannone.github.io/DiagrammeR/ndfs_edfs.html

https://github.com/rich-iannone/DiagrammeR#using-data-frames-to-define-graphviz-graphs

score 0 · Answer 2 · 2017-12-27

Here is something that does what you request, in R

#!/usr/bin/env Rscript

# visualize a character matrix

library("ggplot2")
# install.packages("ggrepel")
library("ggrepel") # for spreading text labels on the plot

lines <- "
MED20_Average   Glycyrrhizic_acid_rep_1

YIPF2_Average   Glycyrrhizic_acid_rep_1

PISD    Glycyrrhizic_acid_rep_1

AURKAIP1    Glycyrrhizic_acid_rep_1

BCL7C   Glycyrrhizic_acid_rep_1

PTCRA_Average   Hydroxysafflor_yellow_A

VPS53_Average   Hydroxysafflor_yellow_A

PTPN9   Hydroxysafflor_yellow_A

PHC3_Average    Anhydroicaritin

SCCPDH_Average  Anhydroicaritin

SOCS2_Average   Anhydroicaritin

SP2_Average Anhydroicaritin

LMTK2   Anhydroicaritin

TIMM10B Anhydroicaritin

GEMIN8  Anhydroicaritin

ABHD17B Anhydroicaritin

ANKMY1_Average  Hyperoside

F11R_Average    Hyperoside

"

con <- textConnection(lines)
data <- read.delim(con, header = FALSE, sep = "")
close(con)

colnames(data) <- c("gene", "sample")

head(data)
# gene                  sample
# 1 MED20_Average Glycyrrhizic_acid_rep_1
# 2 YIPF2_Average Glycyrrhizic_acid_rep_1
# 3          PISD Glycyrrhizic_acid_rep_1
# 4      AURKAIP1 Glycyrrhizic_acid_rep_1
# 5         BCL7C Glycyrrhizic_acid_rep_1
# 6 PTCRA_Average Hydroxysafflor_yellow_A

str(data)
# 'data.frame': 18 obs. of  2 variables:
#     $ gene  : Factor w/ 18 levels "ABHD17B","ANKMY1_Average",..: 8 18 10 3 4 11 17 12 9 13 ...
# $ sample: Factor w/ 4 levels "Anhydroicaritin",..: 2 2 2 2 2 3 3 3 1 1 ...

ggplot(data = data, aes(y = as.numeric(gene), x = as.numeric(gene), label = gene)) + 
    geom_dotplot(alpha = 0) + 
    facet_grid(sample~.) + 
    coord_cartesian(ylim = c(max(as.numeric(data[["gene"]])) + 1, 0 )) +
    theme(
          axis.text.x=element_blank(),
          axis.text.y=element_blank(),
          axis.ticks=element_blank(),
          axis.title.x=element_blank(),
          axis.title.y=element_blank(),
          panel.grid.major=element_blank(),
          panel.grid.minor=element_blank(),
          ) +
    geom_text_repel(aes(y = 0, label = gene), show.legend = FALSE, segment.alpha = 0, force = 5) + 
    ggtitle("Genes per Sample")

Output:

genes

score 0 · Answer 3 · 2017-12-28

0

Entering edit mode

7.1 years ago

Ram ▴ 190

Hi, I think circos.track from circlize package will be quite useful for your request : https://www.rdocumentation.org/packages/dendextend/versions/1.6.0/topics/circlize_dendrogram

ADD COMMENT • link 7.1 years ago by Ram ▴ 190