How to generate an oncoplot with most frequently mutated genes using maftools with annovar annotated output files
0
0
Entering edit mode
6 months ago

Hello,

I have annovar annotated files of my whole genome sequenced data. I want to plot most frequently mutated genes among all my data files along wih the type of mutation (varaint) in an oncoplot. How can I produce this type of graph using the annovar exonic_variant_function file. Can I use maftools for this purpose? Could someone please provide the R script to generate an oncoplot. Thanks.

This is how my data file looks like:

line896 nonsynonymous SNV       gene-SHPRH:XM_005615487.3:exon28:c.C4834T:p.H1612Y,gene-SHPRH:XM_533438.5:exon28:c.C4834T:p.H1612Y,gene-SHPRH:XM_005615488.3:exon28:c.C4834T:p.H1612Y,  NC_006583.3     37085029        37085029
        G       A       het     .       240     93
line922 nonframeshift deletion  gene-STXBP5:XM_533442.6:exon1:c.63_65del:p.21_22del,gene-STXBP5:XM_003432538.4:exon1:c.63_65del:p.21_22del,gene-STXBP5:XM_022417957.1:exon1:c.63_65del:p.21_22del,      NC_006583.3     38271448
        38271450        GCA     -       het     .       235
oncoplot Maftools Annovar • 1.0k views
ADD COMMENT
0
Entering edit mode

What have you tried? Where exactly are you running into problems?

ADD REPLY
0
Entering edit mode

Hi, I used the following script to generate the plot. All the steps ran properly, I got a warning message at the end and the graph was not plotted.

library(dplyr)
library(tidyr)
library(ComplexHeatmap)


# Define the file paths
file_paths <- c("/path/all_files")

# Function to read and process each file
process_file <- function(file_path) {
  # Read the file without headers
  data <- read.table(file_path, header = FALSE, sep = "\t", fill = TRUE, stringsAsFactors = FALSE, na.strings = ".")

  # Assign column names manually (My file doesn't have column names)
  colnames(data) <- c("line", "mutation_type", "genes", "NC", "start", "end", "ref", "alt", "zygosity", "dot", "value1", "value2")

  # Processing steps to extract the gene names correctly
  data <- data %>%
    mutate(genes = strsplit(genes, ",")) %>%  # Split multiple gene annotations
    unnest(genes) %>%
    mutate(gene = gsub("gene-", "", genes),  # Remove the 'gene-' prefix
           gene = sub(":.+$", "", gene)) %>%  # Extract the gene name before ':'
    select(-genes) %>%
    filter(!grepl("^LOC", gene))  # Exclude genes starting with 'LOC'

  return(data)
}

# Read and process all files, then combine the data
all_data <- do.call(rbind, lapply(file_paths, process_file))

# Filter for specific mutation types
filtered_data <- all_data %>%
  filter(mutation_type %in% c("frameshift insertion", "frameshift deletion", "nonsynonymous SNV", "stopgain", "stoploss"))

# Counting the number of mutations per gene
gene_counts <- filtered_data %>%
  group_by(gene) %>%
  summarise(total = n()) %>%
  arrange(desc(total)) %>%
  top_n(20, wt = total)

# Filter the original data to keep only the top 20 genes
filtered_data <- filtered_data %>%
  filter(gene %in% gene_counts$gene)

# Create a matrix with genes as rows and samples as columns
oncoplot_data <- filtered_data %>%
  group_by(gene, line) %>%
  summarise(mutation_type = paste(unique(mutation_type), collapse = ","), .groups = 'drop') %>%
  pivot_wider(names_from = line, values_from = mutation_type, values_fill = list(mutation_type = ""))

# Convert to matrix and set row names
mat <- as.matrix(oncoplot_data[,-1])
rownames(mat) <- oncoplot_data$gene

# Ensure the matrix does not contain NA values
mat[is.na(mat)] <- ""

# Print a portion of the matrix to check it
print(mat[1:10, 1:10])


col = c(
  "frameshift insertion" = "green",
  "frameshift deletion" = "blue",
  "nonsynonymous SNV" = "red",
  "stopgain" = "purple",
  "stoploss" = "orange"
)

# Define the alteration functions
alter_fun = list(
  background = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = "#CCCCCC", col = NA))
  },
  "frameshift insertion" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["frameshift insertion"], col = NA))
  },
  "frameshift deletion" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["frameshift deletion"], col = NA))
  },
  "nonsynonymous SNV" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["nonsynonymous SNV"], col = NA))
  },
  "stopgain" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["stopgain"], col = NA))
  },
  "stoploss" = function(x, y, w, h) {
    grid.rect(x, y, w-unit(2, "pt"), h-unit(2, "pt"), 
              gp = gpar(fill = col["stoploss"], col = NA))
  }
)

# Column title and heatmap legend parameters
column_title = "OncoPrint for Top 20 Mutated Genes"
heatmap_legend_param = list(
  title = "Mutations", 
  at = c("frameshift insertion", "frameshift deletion", "nonsynonymous SNV", "stopgain", "stoploss"), 
  labels = c("Frameshift Insertion", "Frameshift Deletion", "Nonsynonymous SNV", "Stopgain", "Stoploss")
)

# Generating oncoplot
pdf("oncoplot.pdf")
oncoPrint(mat, 
          alter_fun = alter_fun, 
          col = col,
          column_title = column_title, 
          heatmap_legend_param = heatmap_legend_param, 
          alter_fun_is_vectorized = FALSE)
dev.off()
Warning message:
All mutation types: nonsynonymous SNV, stopgain, frameshift deletion.
Warning message:
You defined `cell_fun` for a heatmap with more than 100 rows or columns, which might be very slow to
draw. Consider to use the vectorized version `layer_fun`.

Sorry for the late response. Could you please help me understand where the error occurred? Thanks.

ADD REPLY
0
Entering edit mode

Where is the error?

ADD REPLY
0
Entering edit mode

I just received the warning message I posted above, but the graph did not plot.

ADD REPLY
0
Entering edit mode

D you mean that the pdf file is blank/empty?

ADD REPLY
0
Entering edit mode

It wasn't plotted correctly before. I get it now. Thanks!

ADD REPLY
0
Entering edit mode

What do you mean by "plotted correctly"? Please explain in more detail so your post can be useful for others facing similar issues.

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6