Question

Import list and assign distinct colors in R

0

Entering edit mode

13 months ago

kmyers2 ▴ 90

I have an Rscript that constructs plots for a number of genes. I have hardcoded the gene name and color assignments in the script itself with the following code:

grid.col = c(gene0059="aquamarine",gene0086="bisque",gene0107="blue",gene1658="brown",
             gene0171="brown1",gene0246="burlywood4",gene0404="cadetblue",gene0439="cadetblue1",
             gene0518="chartreuse",gene0716="chartreuse4",gene1099="chocolate",gene1525="chocolate4",
             gene1527="coral",gene1621="coral4",gene1661="cyan",gene1701="cyan3",gene1907="darkgoldenrod",
             gene2223="purple",gene2244="green",gene2552="green4",gene2584="honeydew",
             gene2692="hotpink",gene2874="indianred1",gene2884="khaki1",gene2888="red",
             gene0016="lightblue1",gene0096="lightgoldenrod3",gene1959="lightpink",gene2271="lightsalmon",
             gene2804="lightskyblue",gene2885="magenta",gene3000="blue",gene1689="mediumorchid3",
             gene1776="mediumpurple1",gene2628="olivedrab2",gene0963="orange",gene1137="orangered",
             gene0621="palegreen1",gene2733="royalblue1",gene2806="seagreen1",gene1702="purple1")

When plotting, I can call the grid.col for the colors. Specifically I am using this with the chordDiagram command in the circlize package.

When I import this as a list (example pasted next) the colors do not get assigned properly to the gene IDs, so the plots do not use the right colors.

gene0059="aquamarine"
...
gene1702="purple1"

How do I both format the file and import it to correctly assign the colors to the gene labels?

R ggplot2 • 1.9k views

ADD COMMENT • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

Please show us your full plotting code.

ADD REPLY • link 13 months ago by Ram 45k

0

Entering edit mode

Sure thing. Wasn't sure if it was too much to post:

library(reshape2)
library(circlize)
library(ggplot2)

# process command line argument
args <- commandArgs(TRUE)
working.dir <- args[1]
setwd(working.dir)
# set colors
grid.col = c(gene0059="aquamarine",gene0086="bisque",gene0107="blue",gene1658="brown",
             gene0171="brown1",gene0246="burlywood4",gene0404="cadetblue",gene0439="cadetblue1",
             gene0518="chartreuse",gene0716="chartreuse4",gene1099="chocolate",gene1525="chocolate4",
             gene1527="coral",gene1621="coral4",gene1661="cyan",gene1701="cyan3",gene1907="darkgoldenrod",
             gene2223="purple",gene2244="green",gene2552="green4",gene2584="honeydew",
             gene2692="hotpink",gene2874="indianred1",gene2884="khaki1",gene2888="red",
             gene0016="lightblue1",gene0096="lightgoldenrod3",gene1959="lightpink",gene2271="lightsalmon",
             gene2804="lightskyblue",gene2885="magenta",gene3000="blue",gene1689="mediumorchid3",
             gene1776="mediumpurple1",gene2628="olivedrab2",gene0963="orange",gene1137="orangered",
             gene0621="palegreen1",gene2733="royalblue1",gene2806="seagreen1",gene1702="purple1")

#Plot chord plots
chord_files <- list.files(path = working.dir, pattern = "_forPlotting.txt")
for(i in chord_files){
  sample_name <- sub("_forPlotting.txt", "", i)
  fileName = paste(sample_name,"_chordDiagram.pdf", sep = "_")
  pdf(fileName, width = 12, height = 12)

  list1 <- read.table(file = i, sep = "\t", header = TRUE)
  list1_minHit5 <- list1[list1$Count > 4,]
  matrix1 <- acast(list1_minHit5, geneID_1~geneID_2, value.var="Count")
  chordDiagram(matrix1, annotationTrack = "grid", transparency = 0.1, grid.col = grid.col, preAllocateTracks = list(track.height = max(strwidth(unlist(dimnames(matrix1))))))

  circos.track(track.index = 1, panel.fun = function(x, y) {
    circos.text(CELL_META$xcenter, CELL_META$ylim[1], CELL_META$sector.index, 
                facing = "clockwise", niceFacing = TRUE, adj = c(0, 0.5))
  }, bg.border = NA)
  dev.off()

The input file is just a tab delimited, three column file that has the first gene ID (geneID_1), the connecting gene ID (geneID_2), and value that is used for drawing the chord diagrams (Count).

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

The manual uses a matrix (wide form of the long form dataset you're using). See: https://jokergoo.github.io/circlize_book/book/the-chorddiagram-function.html#basic-usage-of-making-chord-diagram

Try getting your data in that format, that might solve the problem.

ADD REPLY • link 13 months ago by Ram 45k

0

Entering edit mode

Thanks for the reply. I use acast to convert the DF into the format used by chordDiagram. So I don't think the input data format is the issue, but I will certainly take a look.

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

score 0 · Answer 1 · 2024-03-12

0

Entering edit mode

13 months ago

Trivas ★ 1.9k

Your post says you are importing your colors as a list, but you are passing your colors into grid.col as a vector. Try converting your list into a character vector and see if that fixes things.

ADD COMMENT • link 13 months ago by Trivas ★ 1.9k

0

Entering edit mode

Thanks for the reply. I'll put my edits in a new comment for ease of anyone else searching.

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

That does not make any sense to me - how is that different from your named vector creation using the c(name="color"...) syntax?

ADD REPLY • link 13 months ago by Ram 45k

0

Entering edit mode

This lets the user import files if they use different sets of genes in this analysis, rather than have to edit the code.

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

I get that part - how did it make a difference in your case? Were your gene names mismatched between the matrix and the named vector but once you imported them they matched?

ADD REPLY • link 13 months ago by Ram 45k

0

Entering edit mode

Ahh, apologies for the confusion. My original code worked fine. But I realized that if when wanted to use another set of genes and colors, I'd have to open the file and change the code. That's fine for me, but I want to share this code with people who haven't coded in R ever. So I wanted to find a way to have the colors and gene names provided as a command line argument so that it was easy to change for different experiments.

So both versions work, the only difference is my original was hard coded and the updated version requires a command line input.

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

My original code worked fine

What was the problem then? I'm assuming you had a problem, asked the question here and something happened that fixed the problem. What was the problem and what changed now?

ADD REPLY • link 13 months ago by Ram 45k

0

Entering edit mode

The problem was the original script contained the gene and color designations within the code. My question was how do replicate this action but through importing files through the command line and parse them so the graph would print properly. I did not know how to do this and that lead to my initial question. I had figured how (below) how to correctly import and parse the files to replicate the hardcoded part of the script used before. So now I do not have to edit the code every time I want to change the gene IDs or the associated color, just import a new file.

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

score 0 · Answer 2 · 2024-03-12

0

Entering edit mode

13 months ago

kmyers2 ▴ 90

I got this to work and want to put it here case anyone finds this in the future.

I imported two data.frames, one with the gene names (only_names.txt) and one with the colors (only_colors.txt). Then I converted both into vectors and combined them into a single named vector:

names <- read.table(file = "only_names.txt", header = F)
nameVec <- unlist(names, use.names = F)
colors <- read.table(file = "only_colors.txt", header = F)
colorsVec <- unlist(colors, use.names = F)
my_vector <- setNames(colorsVec, nameVec)

When I used my_vector in the chordDiagram command it worked to get the colors correct:

chordDiagram(matrix1, annotationTrack = "grid", transparency = 0.1, grid.col = my_vector, preAllocateTracks = list(track.height = max(strwidth(unlist(dimnames(matrix1))))))

ADD COMMENT • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

You could add a single .csv file as a second command line arg, read it in as a data.frame and make a named vector using the information from the first two columns. That is a lot more user friendly than inputting two separate .txt files that are not obviously paired.

ADD REPLY • link 13 months ago by Trivas ★ 1.9k

0

Entering edit mode

Yes, thanks! Here is the code I use to import a single two column text file, then split it up. Much cleaner!

# import name_color_file and set vector for colors to use
name_color_file <- read.csv(file, header = F)

# split file by columns into two vectors
nameVec <- unlist(name_color_file[[1]], use.names = F)
colorVec <- unlist(name_color_file[[2]], use.names = F)

# make a named vector for gene labes and colors
grid.col <- setNames(colorVec, nameVec)

ADD REPLY • link 13 months ago by kmyers2 ▴ 90

0

Entering edit mode

This seems like a lot of unnecessary lines. You should be able to do name_color_file$column_name to directly get the vector rather than creating two additional variables.