Computing gene frequency of specific regions of interest using R
0
0
Entering edit mode
4.6 years ago
clementpch • 0

Hello everyone,

I'm studying genome topology in plants in order to understand the organisation/compartimentalization of the nuclear genome.

I wanted to know a quick method to compute gene density over specific genome regions of interest in order to plot the gene frequencies per region per bin +/- 50Kb from the start and the end of the region.

I wanted to compute the same things as shown in this figure 4D (show bellow) of "3D Chromatin Architecture of Large Plant Genomes Determined by Local A/B Compartments, https://doi.org/10.1016/j.molp.2017.11.005"

Figures : enter image description here

As you can see in the 4D of the figure panel, the authors compute the gene density per bin (which is normalized by the size of the compartments) related to different genome region type.

Has anyone suggestion to compute this gene density ?

I have do some test (see the code bellow) : I've created a first function to add 50Kb at the start and end of each region :


addFlankingRegion <- function(bedFeaturePath, RegionBedPath, resToAd){
###### read the table
###### For test : RegionBedPath=GACA_melo.path
  RegionBedPath=RegionBedPath
  regionFile = read.table(RegionBedPath,sep="\t",header=F)
###### feature
###### For test : bedFeaturePath=anno_melo
  bedFeaturePath=bedFeaturePath
  features <- read.table(bedFeaturePath,sep="\t",header=F)
###### Selecting wanted column
  regionFile=regionFile[,1:6]
  features=features[,1:6]
###### rename column
 colnames(regionFile)=c("chr","start","end", "id", "score", "strand")
  colnames(features)=c("chr","start","end", "id", "score", "strand")
###### erase 'gene:' pattern in gene name
  features$id=gsub("gene:","",features$id)
###### filtering chr0 in feature files
  features=filter(features, chr != "CMiso1.1chr00")
###### Add +/- 50Kb to the genome region coordonate
###### For start :
  regionFile = mutate(regionFile, startCust = regionFile$start - resToAd)
###### For end :
  regionFile = mutate(regionFile, endCust = regionFile$end + resToAd)
###### return dataframe customized
  return(regionFile)
  return(features)
}

Then I Try to binarize my compartments( = region of interets) :


makeSlidingWindow <- function(regionFile,windowBinVarName){
###### create regionfile custom with the custom coordonate
  regionFileCustom=regionFile[,c(1,7,8,4,5,6)]
###### Convert region dataframe to Granges objects
  regionFile.gr = makeGRangesFromDataFrame(regionFileCustom)
###### making sliding windows
######windowBinVarName=GenomicRanges::slidingWindowsregionFile.gr, width = width, step=step) # dividing region by non-overlap 50 bases pairs sub windows
  regionBin=GenomicRanges::slidingWindowsregionFile.gr, n = n)
  return(windowBinVarName)
}

To finish, I compute the gene density :


computeFeatureFreq <- function(features, regionFile, outDFName){
###### convert feature as GRange
  features.gr = makeGRangesFromDataFrame(features)
###### convert regionFile as Grange
  regionFile.gr = makeGRangesFromDataFrame(regionFile)
###### Counting gene density per genome region
###### feature.density.windowBin = countOverlapswindowBin.gr,features.gr,ignore.strand=ignore.strand, minoverlap=minoverlap, maxgap=maxgap)
###### feature.density.windowBin = countOverlapsfeatures.gr, windowBin.gr,ignore.strand=ignore.strand, minoverlap=minoverlap, maxgap=maxgap)
  feature.density.region = countOverlapsfeatures.gr, regionFile.gr)
###### retreive coordonates for each windowBin
  regionFile.df=as.data.frameregionFile.gr)
  feature.density.region.df = data.frame(chr=regionFile.df$seqnames, start=regionFile.df$start, end=regionFile.df$end, count=feature.density.region)
  return(feature.density.region.df)
}


The issue with this method is that I have not a good binarization for my compartments and I don't know why. More over, I don't know how to plot the computed information after the third function.

If anyone has suggestion thanks for helping me , everything is welcome. Thanks in advance.

genome gene R • 1.3k views
ADD COMMENT
0
Entering edit mode

You may want to try doing this with something like deepTools plotProfile.

ADD REPLY
0
Entering edit mode

Hi Igor, I've already try deepTools, the problem with this method is that to plot the density with deeptools you must have a bigWig related to the features in that you want to compute the density for region of interest. But in my case I don't have bigWig of my genes, I just have a bed/gff3/gtf file of my all mys genes.

ADD REPLY
0
Entering edit mode

You can make a bedGraph with the same weight for all regions from a bed/gff3/gtf file and then convert that to bigWig.

You can also try some of the other tools, some of which have slightly different input/output options: Visualization of ChIP-seq data using Heatmaps (Updated: 06/10/16)

ADD REPLY
0
Entering edit mode

Which tool do you used to convert bed to bedgraph please ? I think I will try this options

ADD REPLY
0
Entering edit mode

You don't need a specialized tool. You can quickly make it in bash or R or even Excel. bedgraph is like a bed file, but the 4th column is the data value: https://genome.ucsc.edu/goldenPath/help/bedgraph.html

ADD REPLY
0
Entering edit mode

Ok thanks, I understand I will use awk command to do this. To be sure, I set the data value at 1 for each gene ? I mean the data value correspond to the bed score column ?

ADD REPLY
0
Entering edit mode

Sure. You could also normalize by gene length. It really depends on the exact question.

ADD REPLY
0
Entering edit mode

Thanks a lot for your help Igor. I have convert my gene.bed annotation file into bedgraph (see below an exemple ) :


chr1    19056   19132   1
chr1    23012   23989   1
chr1    23990   25394   1
chr1    39764   40757   1
chr1    45872   46172   1
chr1    58908   59872   1
chr1    59875   61036   1
chr1    61803   61875   1
chr1    76843   77041   1
chr1    88923   92091   1


Then I have convert this bedgraph into a bigWig file following this command :

bedGraphToBigWig $bedGraphGene $chromSize ${bigWigOutDir}/${bedgraphName}.bw

I obtained a binary files that is not empty and not corrupt.

So I have use as you say, deepTools to plot the gene density profile in each genome region type.


computeMatrix scale-regions -S ${bigWigOutDir}/${bedgraphName}.bw -R $GACA $GACB $GBCA $GBCB -b 50000 -a 50000 --regionBodyLength 70000 -o ${mtxOutDir}/matrix_genomeRegion_geneDensity_saled.gz --outFileNameMatrix ${mtxOutDir}/matrix_genomeRegion_geneDensity_scaled.tab --outFileSortedRegions ${mtxOutDir}/matrix_genomeRegion_geneDensity.bed

It compute the matrix and the different file format of the matrix without errors.

To finish I used plotProfile function of deepTools to plot the gene density following this command:

plotProfile -m ${mtxOutDir}/matrix_genomeRegion_geneDensity_saled.gz \
              -out plot_genomeRegion_geneDensity.png \
              --numPlotsPerRow 2 \
              --averageType "mean" \
              --plotTitle "gene and TE density profile" \
              --regionsLabel $regLabelsList

I obtained a png output file without error. My problem is that as you can see bellow is not really that I want ... haha (If we compared to the figure that I show in the first post).

output png picture from deepTools plot-genome-Region-gene-Density

I thinks It's due to the bed score set up at one.

I want the gene frenquencies per bin for each genome region, should I normalize the score by gene length ?

PS: More over when I check the matrix I see when the region or the bin doesn't overlap with gene it place " NaN" value instead of "0"

ADD REPLY
0
Entering edit mode

Before trying many regions, try just one. Also try using a regular bigwig to see if maybe there is an issue with not having score diversity.

ADD REPLY

Login before adding your answer.

Traffic: 1753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6