How do I get ranges for first or last n nucleotides in genomicranges
1
0
Entering edit mode
5.5 years ago
lizaveta • 0

I need to get first (meaning with the most left coordinates possible) or the last n nucleotides positions (let' say, 20 nts but it can be up to the whole genomic region width) in a genomic ranges object grTarget of length 23 kb:

> grTarget

 GRanges object with 2 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25406855      *
  [2]    chr12 25408100-25423323      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

I expect to get:

1) the most left 20 nts:

 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

1) the most right 20 nts:

 GRanges object with 1 ranges and 0 metadata columns:
  seqnames            ranges strand
     <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
   -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

Is there some function that can do this? I only can think of making a function that will take nucleotides in a loop until the length of the subsetted region reaches the desired length:

MostLeftNucleotides = function(grObject, nts_number) {
  for (i_finish in c(1:length(grObject))) {
    if (nts_number <= sum(width(grObject[1:i_finish]))) {
        break
    }
  }
  if (i_finish > 1) {
    grSubset = grObject[1:(i_finish-1)]
    rest_nts = nts_number - sum(width(grSubset))
    LastInterval = grObject[i_finish]
    end(LastInterval) = start(LastInterval) + rest_nts

    grSubset = union(grSubset, LastInterval)
  } else {
    grSubset = grObject[i_finish]
    end(grSubset) = start(grSubset) + nts_number
  }
  return(grSubset) 
}
genomic ranges subsetting • 2.4k views
ADD COMMENT
0
Entering edit mode
5.5 years ago
AK ★ 2.2k

It can be:

grTarget <-
  GRanges(seqnames = "chr12", ranges = IRanges(
    start = c(25398324, 25406855),
    end = c(25406855, 25423323)
  ))

grTarget_L20nt <-
  GRanges(
    seqnames = grTarget@seqnames,
    ranges = IRanges(start = grTarget@ranges@start, width = 21)
  )

grTarget_R20nt <-
  GRanges(
    seqnames = grTarget@seqnames,
    ranges = IRanges(
      start = grTarget@ranges@start + grTarget@ranges@width - 21,
      width = 21
    )
  )

Which returns:

> grTarget_L20nt
GRanges object with 2 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
  [2]    chr12 25406855-25406875      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> grTarget_R20nt
GRanges object with 2 ranges and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25406835-25406855      *
  [2]    chr12 25423303-25423323      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

Thank you @SMK! your answer, however, is not exactly what I was wondering about. I updated the post with some code I have at the moment, it should make my question clearer.

ADD REPLY
0
Entering edit mode

Alright, flank + range. ;-)

> flank(range(grTarget), width = -21, start = TRUE)
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25398324-25398344      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> flank(range(grTarget), width = -21, start = FALSE)
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]    chr12 25423303-25423323      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD REPLY
0
Entering edit mode

It is already closer :) However, if I need a bigger interval (that spans across the first interval and takes half of the second) - it does not work properly and output ranges goes beyond borders or grTarget and contains 1 interval ( and it should contain 2).

ADD REPLY
0
Entering edit mode

Can you show the problematic data as an example? :-(

ADD REPLY

Login before adding your answer.

Traffic: 1096 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6