Question

How do I create a loop for each unique genes to subset the GRanges objects only by the gene?

0

Entering edit mode

2.8 years ago

melissachua90 ▴ 70

I have a txf_df which I subset by gene.list$entrez and then found the list of unique number of transcripts. The txf_df is then converted to txf_grange.

Now, I want to create a for loop of the 15 unique genes, where upon each iteration, subset the txf_grange objects by only the specific gene.

Traceback:

Error in (function (classes, fdef, mtable)  :    unable to find an inherited method for function ‘subsetByOverlaps’ for signature ‘"GRanges", "character"’

Code:

# Subset by the Entrez IDs
txf_df <- txf_df %>% filter(geneName %in% gene.list$entrez)

# Find the number of common transcripts
unique <- unique(txf_df$geneName)
length(unique)

# Recast this dataframe back to a GRanges object
txf_grange <- makeGRangesFromDataFrame(txf_df, keep.extra.columns=T)

# For each of the 15 genes, subset the Granges objects by only the gene
for (i in unique) {
  subsetByOverlaps(txf_grange, i)
}

Data:

> dput(head(txf_df))
structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "16", class = "factor"), 
    start = c(12058964L, 12059311L, 12059311L, 12060052L, 12060198L, 
    12060198L), end = c(12059311L, 12060052L, 12061427L, 12060198L, 
    12060877L, 12061427L), width = c(348L, 742L, 2117L, 147L, 
    680L, 1230L), strand = structure(c(1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = c("+", "-", "*"), class = "factor"), type = structure(c(3L, 
    1L, 1L, 2L, 1L, 1L), .Label = c("J", "I", "F", "L", "U"), class = "factor"), 
    txName = structure(list(c("uc002dbv.3", "uc010buy.3", "uc010buz.3"
    ), c("uc002dbv.3", "uc010buy.3"), "uc010buz.3", c("uc002dbv.3", 
    "uc010buy.3"), "uc010buy.3", "uc002dbv.3"), class = "AsIs"), 
    geneName = structure(list("608", "608", "608", "608", "608", 
        "608"), class = "AsIs")), row.names = c(NA, 6L), class = "data.frame")

> dput(head(gene.list))
structure(list(Name = c("AQP8", "CLCA1", "GUCA2B", "ZG16", "CA4", 
"CA1"), Pvalue = c(3.24077275512836e-22, 2.57708986670727e-21, 
5.53491656902485e-21, 4.14482213350182e-20, 2.7795892896524e-19, 
1.23890644641685e-18), adjPvalue = c(8.3845272720681e-18, 6.66744690314504e-17, 
1.43199361473811e-16, 1.07234838237959e-15, 7.19135341018869e-15, 
3.20529875816967e-14), logFC = c(-3.73323340223377, -2.96422555675244, 
-3.34493724166712, -2.87787132076412, -2.87670608798164, -3.15664667432159
), entrez = c(AQP8 = "343", CLCA1 = "1179", GUCA2B = "2981", 
ZG16 = "653808", CA4 = "762", CA1 = "759")), row.names = c(NA, 
6L), class = "data.frame")

subsetByOverlaps GenomicRanges GRanges findOverlaps • 959 views

ADD COMMENT • link updated 2.8 years ago by Papyrus ★ 3.0k • written 2.8 years ago by melissachua90 ▴ 70

0

Entering edit mode

If you mean that you want to split a GRanges by the values in some column (e.g. gene names) you can use split():

# Create example data
foo <- data.frame(seqnames = paste0("chr",1:10),
                  start = 31:40,
                  end = 41:50,
                  strand = "*",
                  geneName = letters[1:10])

foo <- makeGRangesFromDataFrame(foo, keep.extra.columns = T)

# Filter the GR by genes of interest and then split into GRs

genes <- c("c","b","g")
foo.f <- foo[foo$geneName %in% genes]
foo.s <- split(x = foo.f, f = as.character(foo.f$geneName))

ADD REPLY • link 2.8 years ago by Papyrus ★ 3.0k