Hi,
I am trying to find what gene, if any, a specific nucleotide positions belong to. For example I'd like to know what gene nucleotide 100004651 is on. I'm using R currently and would be grateful if anyone can point me towards any packages that could help.
I'm working with the human genome if that helps at all, and also have information on which chromsome the position belongs to and which strand(+/-)
Here is an internal function that I am about to add to Bioconductor's CAGEr package, to solve the same problem.
#' ranges2genes
#'
#' Assign gene symbol(s) to Genomic Ranges.
#'
#' @param ranges Genomics Ranges object, for example extracted from a
#' RangedSummarizedExperiment object with the \code{rowRanges}
#' command.
#'
#' @param genes A \code{\link{GRanges}} object containing \code{gene_name} metadata.
#'
#' @return A character vector of same length as the GRanges object, indicating
#' one gene symbol or a comma-separated list of gene symbols for each
#' range.
#'
#' @importFrom GenomicRanges findOverlaps
#' @importFrom S4Vectors List Rle unstrsplit
#' @importFrom IRanges extractList
#'
#' @examples
#' # Example for Biostars
#' # Imagine that nucleotides are represented by a GRanges object called "gr"
#' # Download GENCODE from Bioc's annotation hub:
#' ah <- AnnotationHub::AnnotationHub()
#' gff <- ah[["AH49556"]]
#' Instead one can also use rtracklayer::import.gff, etc...
#'
#' Annotate the GRanges:
#' gr$geneSymbols <- ranges2genes(gr, gff)
ranges2genes <- function(ranges, genes) {
if (is.null(genes$gene_name))
stop("Annotation must contain ", dQuote("gene_name"), " metdata.")
gnames <- findOverlaps(ranges, genes)
gnames <- as(gnames, "List")
gnames <- extractList(genes$gene_name, gnames)
gnames <- unique(gnames)
gnames <- unstrsplit(gnames, ";")
Rle(gnames)
}
use bedtools to intersect coordinates with gene coordinates