Hi,
I have a small set of bam files from centromere regions of S. cerevisiae. Those I have extracted from the complete bam files using the samtools view option based on a list of centromere positions as shown below. I have used the centromere middle point to go forward and reverse 30kb and extract the regions.
chromosome CEN-[30kb] CENmitte CEN+[30kb]
I 121500 151500 181500
II 208300 238300 268300
III 84500 114500 144500
IV 419700 449700 479700 ...
I would like to see how many reads are mapped to each of the positions in these regions. This I would like to count into 50b bins of genome regions. For that I have used the tileGenome command to create a virtual GRanges object as shown below (this is why it has no actual information yet).
bins1 <- tileGenome(seqinfo(Scerevisiae), tilewidth=500, cut.last.tile.in.chrom=TRUE)
GRanges object with 24322 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrI [ 1, 500] *
[2] chrI [ 501, 1000] *
[3] chrI [1001, 1500] *
[4] chrI [1501, 2000] *
[5] chrI [2001, 2500] *
... ... ... ...
[24318] chrM [83501, 84000] *
[24319] chrM [84001, 84500] *
[24320] chrM [84501, 85000] *
[24321] chrM [85001, 85500] *
[24322] chrM [85501, 85779] *
-------
seqinfo: 17 sequences (1 circular) from sacCer3 genome
using the summarizeOverlaps
command I can now summarize exactly how many reads are mapped into each of my bins in the genome
bamFiles <- list.files(path = ~/centromereBamFIles.withchrIII/", pattern = ".bam$", full.names = TRUE)
olapTable <- summarizeOverlaps(bins1, bamFiles, inter.feature=FALSE)
But the problem is that I still have the complete genome in the bins1 GRanges object. But I would like it to contain only the regions I am using for the centromeres.
Hence my question, is there a way to extract multiple regions from the GRanges object? I would like to extract the centromere regions (± a few 20kb) of each chromosome. e.g. these regions:
chrI:101500-201500
chrII:188300-288300
chrIII:64500-164500
But I can't find a way to do it?
I know I can remove all the rows with a sum of 0, but with that I will also remove the rows from the centromere regions where no reads were mapped. These I need to keep in my GRanges object.. So I will try to work it out with restrict
or findOverlaps
before I run the summarizeOverlaps
with the bam files
Is there a more direct way to do it?
Thanks
Assa
Do you have centromere coordinates ? What do you mean by "I would like to extract the centromere regions (± a few 20kb) of each chromosome"