Hello everyone,
I'm studying genome topology in plants in order to understand the organisation/compartimentalization of the nuclear genome.
I wanted to know a quick method to compute gene density over specific genome regions of interest in order to plot the gene frequencies per region per bin +/- 50Kb from the start and the end of the region.
I wanted to compute the same things as shown in this figure 4D (show bellow) of "3D Chromatin Architecture of Large Plant Genomes Determined by Local A/B Compartments, https://doi.org/10.1016/j.molp.2017.11.005"
Figures :
As you can see in the 4D of the figure panel, the authors compute the gene density per bin (which is normalized by the size of the compartments) related to different genome region type.
Has anyone suggestion to compute this gene density ?
I have do some test (see the code bellow) : I've created a first function to add 50Kb at the start and end of each region :
addFlankingRegion <- function(bedFeaturePath, RegionBedPath, resToAd){
###### read the table
###### For test : RegionBedPath=GACA_melo.path
RegionBedPath=RegionBedPath
regionFile = read.table(RegionBedPath,sep="\t",header=F)
###### feature
###### For test : bedFeaturePath=anno_melo
bedFeaturePath=bedFeaturePath
features <- read.table(bedFeaturePath,sep="\t",header=F)
###### Selecting wanted column
regionFile=regionFile[,1:6]
features=features[,1:6]
###### rename column
colnames(regionFile)=c("chr","start","end", "id", "score", "strand")
colnames(features)=c("chr","start","end", "id", "score", "strand")
###### erase 'gene:' pattern in gene name
features$id=gsub("gene:","",features$id)
###### filtering chr0 in feature files
features=filter(features, chr != "CMiso1.1chr00")
###### Add +/- 50Kb to the genome region coordonate
###### For start :
regionFile = mutate(regionFile, startCust = regionFile$start - resToAd)
###### For end :
regionFile = mutate(regionFile, endCust = regionFile$end + resToAd)
###### return dataframe customized
return(regionFile)
return(features)
}
Then I Try to binarize my compartments( = region of interets) :
makeSlidingWindow <- function(regionFile,windowBinVarName){
###### create regionfile custom with the custom coordonate
regionFileCustom=regionFile[,c(1,7,8,4,5,6)]
###### Convert region dataframe to Granges objects
regionFile.gr = makeGRangesFromDataFrame(regionFileCustom)
###### making sliding windows
######windowBinVarName=GenomicRanges::slidingWindowsregionFile.gr, width = width, step=step) # dividing region by non-overlap 50 bases pairs sub windows
regionBin=GenomicRanges::slidingWindowsregionFile.gr, n = n)
return(windowBinVarName)
}
To finish, I compute the gene density :
computeFeatureFreq <- function(features, regionFile, outDFName){
###### convert feature as GRange
features.gr = makeGRangesFromDataFrame(features)
###### convert regionFile as Grange
regionFile.gr = makeGRangesFromDataFrame(regionFile)
###### Counting gene density per genome region
###### feature.density.windowBin = countOverlapswindowBin.gr,features.gr,ignore.strand=ignore.strand, minoverlap=minoverlap, maxgap=maxgap)
###### feature.density.windowBin = countOverlapsfeatures.gr, windowBin.gr,ignore.strand=ignore.strand, minoverlap=minoverlap, maxgap=maxgap)
feature.density.region = countOverlapsfeatures.gr, regionFile.gr)
###### retreive coordonates for each windowBin
regionFile.df=as.data.frameregionFile.gr)
feature.density.region.df = data.frame(chr=regionFile.df$seqnames, start=regionFile.df$start, end=regionFile.df$end, count=feature.density.region)
return(feature.density.region.df)
}
The issue with this method is that I have not a good binarization for my compartments and I don't know why. More over, I don't know how to plot the computed information after the third function.
If anyone has suggestion thanks for helping me , everything is welcome. Thanks in advance.
You may want to try doing this with something like deepTools plotProfile.
Hi Igor, I've already try deepTools, the problem with this method is that to plot the density with deeptools you must have a bigWig related to the features in that you want to compute the density for region of interest. But in my case I don't have bigWig of my genes, I just have a bed/gff3/gtf file of my all mys genes.
You can make a bedGraph with the same weight for all regions from a bed/gff3/gtf file and then convert that to bigWig.
You can also try some of the other tools, some of which have slightly different input/output options: Visualization of ChIP-seq data using Heatmaps (Updated: 06/10/16)
Which tool do you used to convert bed to bedgraph please ? I think I will try this options
You don't need a specialized tool. You can quickly make it in bash or R or even Excel. bedgraph is like a bed file, but the 4th column is the data value: https://genome.ucsc.edu/goldenPath/help/bedgraph.html
Ok thanks, I understand I will use awk command to do this. To be sure, I set the data value at 1 for each gene ? I mean the data value correspond to the bed score column ?
Sure. You could also normalize by gene length. It really depends on the exact question.
Thanks a lot for your help Igor. I have convert my gene.bed annotation file into bedgraph (see below an exemple ) :
Then I have convert this bedgraph into a bigWig file following this command :
I obtained a binary files that is not empty and not corrupt.
So I have use as you say, deepTools to plot the gene density profile in each genome region type.
It compute the matrix and the different file format of the matrix without errors.
To finish I used plotProfile function of deepTools to plot the gene density following this command:
I obtained a png output file without error. My problem is that as you can see bellow is not really that I want ... haha (If we compared to the figure that I show in the first post).
I thinks It's due to the bed score set up at one.
I want the gene frenquencies per bin for each genome region, should I normalize the score by gene length ?
PS: More over when I check the matrix I see when the region or the bin doesn't overlap with gene it place " NaN" value instead of "0"
Before trying many regions, try just one. Also try using a regular bigwig to see if maybe there is an issue with not having score diversity.