Assign a genomic region to a window
1
1
Entering edit mode
5.7 years ago
Ana ▴ 200

I am dealing with a bunch of inversions and running some analysis on each of them .One of them is located on chrommose one, the starting position is 6199548 and ending position is 9699260. So the length of this position is 3499712 bp. I want to generate non-overlapping windows based on the length of this focal positions (3499712) and assign the focal position to one of these windows that overlap with the starting and ending position of my focal position. I am achieving this by using GenomicRanges::tileGenome function in r:

seqlength <-c(chr1=159217232,chr2=184765313,chr3=182121094,chr4=221926752,chr5=187413619)
bin_length<-c(3499712) 
 bins<-tileGenome(seqlength,tilewidth=bin_length,cut.last.tile.in.chrom
        = T)
            #bins_u<-unlist(bins)
            write.table(bins,file=paste("bins_good_",bin_length,sep=""),col.names
        = FALSE, row.names = TRUE) }

This generates non-overlapping 3499712 bps windows across the genome.

window_id chromosome start end length
"1" "chr1" 1 3499712 3499712
"2" "chr1" 3499713 6999424 3499712 
"3" "chr1" 6999425 10499136 3499712 
"4" "chr1" 10499137 13998848 3499712 
"5" "chr1" 13998849 17498560 3499712

However I have a major problem here that I have not been able to sort it out. I want one of these windows exactly overlaps with the starting and ending position of my focal position (6199548:9699260), only the focal position not any other position outside this region. This means 2 windows (one before and one after the focal position) will have a different length which my analysis is fine with it. I just do not know how to do it?

This is my desired output:

window_id chromosome start end length
"1" "chr1" 1 3499712 3499712 "*"                  
"2" "chr1" 3499713 6199547 2699834 "*"
"3" "chr1" 6199548  9699260 3499712 "*"
"4" "chr1" 9699261 13998848 4299587 "*"
 "5" "chr1" 13998849 17498560 3499712 "*"

As you see the the second window is now smaller, the third window accommodates the focal position and the forth window is a little larges than bin size. Does anyone have any idea how can I do this task in R? Help me please, I am really stuck!

R windows bioconductor GenomicRanges • 1.8k views
ADD COMMENT
2
Entering edit mode
5.7 years ago

I'm not sure how you did your calculation but for me it looks like your region is 3499713 bp in length.

site <- GRanges("chr1:6199548-9699260")

seqlength <- c(chr1=159217232,chr2=184765313,chr3=182121094,chr4=221926752,chr5=187413619)

bin_length <- width(site)

## check length of region
print(bin_length)
##3499713

bins <- tileGenome(seqlength, tilewidth=bin_length, cut.last.tile.in.chrom=TRUE)

## extract overlapping bins
flank.bins <- subsetByOverlaps(bins,site)

## find new coordinates excluding region of interest
new.flank.bins <- setdiff(reduce(flank.bins),site)

## combine new bins with your region of interest
new.bins <- c(new.flank.bins,site)

## remove old bins
bins <- subsetByOverlaps(bins,new.bins,invert=T)

## add new bins
bins <- sort(c(bins,new.bins))

## check that it makes sense
##table(width(bins[seqnames(bins) == 'chr1']))

write.table(bins,file=paste("bins_good_",bin_length,sep=""),col.names= FALSE, row.names = TRUE)
ADD COMMENT

Login before adding your answer.

Traffic: 2401 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6