Entering edit mode
5.5 years ago
lizaveta
•
0
I need to get first (meaning with the most left coordinates possible) or the last n nucleotides positions (let' say, 20 nts but it can be up to the whole genomic region width) in a genomic ranges object grTarget
of length 23 kb:
> grTarget
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr12 25398324-25406855 *
[2] chr12 25408100-25423323 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
I expect to get:
1) the most left 20 nts:
GRanges object with 1 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr12 25398324-25398344 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
1) the most right 20 nts:
GRanges object with 1 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr12 25423303-25423323 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
Is there some function that can do this? I only can think of making a function that will take nucleotides in a loop until the length of the subsetted region reaches the desired length:
MostLeftNucleotides = function(grObject, nts_number) {
for (i_finish in c(1:length(grObject))) {
if (nts_number <= sum(width(grObject[1:i_finish]))) {
break
}
}
if (i_finish > 1) {
grSubset = grObject[1:(i_finish-1)]
rest_nts = nts_number - sum(width(grSubset))
LastInterval = grObject[i_finish]
end(LastInterval) = start(LastInterval) + rest_nts
grSubset = union(grSubset, LastInterval)
} else {
grSubset = grObject[i_finish]
end(grSubset) = start(grSubset) + nts_number
}
return(grSubset)
}
Thank you @SMK! your answer, however, is not exactly what I was wondering about. I updated the post with some code I have at the moment, it should make my question clearer.
Alright,
flank
+range
. ;-)It is already closer :) However, if I need a bigger interval (that spans across the first interval and takes half of the second) - it does not work properly and output ranges goes beyond borders or
grTarget
and contains 1 interval ( and it should contain 2).Can you show the problematic data as an example? :-(