Question

How to generate an oncoplot with most frequently mutated genes using maftools with annovar annotated output files

0

Entering edit mode

8 months ago

sainavyav22 • 0

Hello,

I have annovar annotated files of my whole genome sequenced data. I want to plot most frequently mutated genes among all my data files along wih the type of mutation (varaint) in an oncoplot. How can I produce this type of graph using the annovar exonic_variant_function file. Can I use maftools for this purpose? Could someone please provide the R script to generate an oncoplot. Thanks.

This is how my data file looks like:

line896 nonsynonymous SNV       gene-SHPRH:XM_005615487.3:exon28:c.C4834T:p.H1612Y,gene-SHPRH:XM_533438.5:exon28:c.C4834T:p.H1612Y,gene-SHPRH:XM_005615488.3:exon28:c.C4834T:p.H1612Y,  NC_006583.3     37085029        37085029
        G       A       het     .       240     93
line922 nonframeshift deletion  gene-STXBP5:XM_533442.6:exon1:c.63_65del:p.21_22del,gene-STXBP5:XM_003432538.4:exon1:c.63_65del:p.21_22del,gene-STXBP5:XM_022417957.1:exon1:c.63_65del:p.21_22del,      NC_006583.3     38271448
        38271450        GCA     -       het     .       235

oncoplot Maftools Annovar • 1.4k views

ADD COMMENT • link updated 8 months ago by Ram 44k • written 8 months ago by sainavyav22 • 0

0

Entering edit mode

What have you tried? Where exactly are you running into problems?

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

Hi, I used the following script to generate the plot. All the steps ran properly, I got a warning message at the end and the graph was not plotted.

library(dplyr)
library(tidyr)
library(ComplexHeatmap)


# Define the file paths
file_paths <- c("/path/all_files")

# Function to read and process each file
process_file <- function(file_path) {
  # Read the file without headers
  data <- read.table(file_path, header = FALSE, sep = "\t", fill = TRUE, stringsAsFactors = FALSE, na.strings = ".")

  # Assign column names manually (My file doesn't have column names)
  colnames(data) <- c("line", "mutation_type", "genes", "NC", "start", "end", "ref", "alt", "zygosity", "dot", "value1", "value2")

  # Processing steps to extract the gene names correctly
  data <- data %>%
    mutate(genes = strsplit(genes, ",")) %>%  # Split multiple gene annotations
    unnest(genes) %>%
    mutate(gene = gsub("gene-", "", genes),  # Remove the 'gene-' prefix
           gene = sub(":.+$", "", gene)) %>%  # Extract the gene name before ':'
    select(-genes) %>%
    filter(!grepl("^LOC", gene))  # Exclude genes starting with 'LOC'

  return(data)
}

# Read and process all files, then combine the data
all_data <- do.call(rbind, lapply(file_paths, process_file))

# Filter for specific mutation types
filtered_data <- all_data %>%
  filter(mutation_type %in% c("frameshift insertion", "frameshift deletion", "nonsynonymous SNV", "stopgain", "stoploss"))

# Counting the number of mutations per gene
gene_counts <- filtered_data %>%
  group_by(gene) %>%
  summarise(total = n()) %>%
  arrange(desc(total)) %>%
  top_n(20, wt = total)

# Filter the original data to keep only the top 20 genes
filtered_data <- filtered_data %>%
  filter(gene %in% gene_counts$gene)

# Create a matrix with genes as rows and samples as columns
oncoplot_data <- filtered_data %>%
  group_by(gene, line) %>%
  summarise(mutation_type = paste(unique(mutation_type), collapse = ","), .groups = 'drop') %>%
  pivot_wider(names_from = line, values_from = mutation_type, values_fill = list(mutation_type = ""))

# Convert to matrix and set row names
mat <- as.matrix(oncoplot_data[,-1])
rownames(mat) <- oncoplot_data$gene

# Ensure the matrix does not contain NA values
mat[is.na(mat)] <- ""

# Print a portion of the matrix to check it
print(mat[1:10, 1:10])


col = c(
  "frameshift insertion" = "green",
  "frameshift deletion" = "blue",
  "nonsynonymous SNV" = "red",
  "stopgain" = "purple",
  "stoploss" = "orange"
)

# Define the alteration functions
alter_fun = list(
  background = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = "#CCCCCC", col = NA))
  },
  "frameshift insertion" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["frameshift insertion"], col = NA))
  },
  "frameshift deletion" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["frameshift deletion"], col = NA))
  },
  "nonsynonymous SNV" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["nonsynonymous SNV"], col = NA))
  },
  "stopgain" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["stopgain"], col = NA))
  },
  "stoploss" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["stoploss"], col = NA))
  }
)

# Column title and heatmap legend parameters
column_title = "OncoPrint for Top 20 Mutated Genes"
heatmap_legend_param = list(
  title = "Mutations", 
  at = c("frameshift insertion", "frameshift deletion", "nonsynonymous SNV", "stopgain", "stoploss"), 
  labels = c("Frameshift Insertion", "Frameshift Deletion", "Nonsynonymous SNV", "Stopgain", "Stoploss")
)

# Generating oncoplot
pdf("oncoplot.pdf")
oncoPrint(mat, 
          alter_fun = alter_fun, 
          col = col,
          column_title = column_title, 
          heatmap_legend_param = heatmap_legend_param, 
          alter_fun_is_vectorized = FALSE)
dev.off()

Warning message:
All mutation types: nonsynonymous SNV, stopgain, frameshift deletion.
Warning message:
You defined `cell_fun` for a heatmap with more than 100 rows or columns, which might be very slow to
draw. Consider to use the vectorized version `layer_fun`.

Sorry for the late response. Could you please help me understand where the error occurred? Thanks.

ADD REPLY • link updated 8 months ago by Ram 44k • written 8 months ago by sainavyav22 • 0

0

Entering edit mode

Where is the error?

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

I just received the warning message I posted above, but the graph did not plot.

ADD REPLY • link 8 months ago by sainavyav22 • 0

0

Entering edit mode

D you mean that the pdf file is blank/empty?

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

It wasn't plotted correctly before. I get it now. Thanks!

ADD REPLY • link 8 months ago by sainavyav22 • 0

0

Entering edit mode

What do you mean by "plotted correctly"? Please explain in more detail so your post can be useful for others facing similar issues.

ADD REPLY • link 8 months ago by Ram 44k