Is there a way to write a genbank file in R?
1
0
Entering edit mode
3 days ago
Jack • 0

Hi All,

I'm trying to edit some genbank file fields in R (removing whitespace from gene feature tags) and write the updated output to a new file. Whilst I can read .gbk files (using gggenomes) and perform the edits I want, I cannot for the life of me find any R package that allows me to write a genbank file.

Does anyone know of a way to do this? I noticed BioPython has a number of options but I'm not as familiar with Python so would struggle more with performing the edits I need to make, though if no R option is available I suppose I'll make the switch.

Thank you!

R Genbank Genetics BioConductor • 199 views
ADD COMMENT
0
Entering edit mode
3 days ago

From the gggenomes github:

#' Read genbank files
#'
#' Genbank flat files (.gb/.gbk/.gbff) and their ENA and DDBJ equivalents have a
#' particularly gruesome format. That's why [read_gbk()] is just a wrapper
#' around a Perl-based `gb2gff` converter and [read_gff3()].
#'
#' @importFrom readr read_tsv
#' @inheritParams read_gff3
#' @export
#' @return tibble
read_gbk <- function(file, sources = NULL, types = NULL, infer_cds_parents = TRUE) {
  gb2gff <- base::system.file("exec/gb2gff", package = "gggenomes")

  if (file_is_zip(file) && file_ext(file, ignore_zip = FALSE) != "gz") {
    abort(str_glue("Decompressing for genbank only works with gzipped files, not `{suf}`"))
  }

  if (file_is_url(file) && file_is_zip(file)) {
    file <- pipe(str_glue("curl {file} | gzip -cd | {gb2gff} -S"))
  } else if (file_is_url(file)) {
    file <- pipe(str_glue("curl {file} | {gb2gff} -S"))
  } else if (file_is_zip(file)) {
    suf <- file_ext(file, ignore_zip = FALSE)
    if (suf != "gz") {
      abort(str_glue("Decompressing for genbank only works with gzipped files, not `{suf}`"))
    }
    file <- pipe(str_glue("gzip -cd {file} | {gb2gff} -S"))
  } else {
    file <- pipe(str_glue("{gb2gff} -S {file}"))
  }

  read_gff3(file,
    sources = sources, types = types,
    infer_cds_parents = infer_cds_parents
  )
}

All the R command here does is run gb2gff and then reads in the gff file using read_gff3()internally.

So while it looks like this package found a workaround by going via gff3. It has a converse write_gff3() function and then perhaps you can use other tools outside of this, such as https://bioinformatics.stackexchange.com/questions/11115/existing-tool-for-converting-gff3-to-genbank-gbk

ADD COMMENT

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6