Question

Construction Of Average Gene Profile

2

Entering edit mode

11.8 years ago

Dataminer ★ 2.8k

Hi!

I want to construct a average gene profile of my ChIP-seq data (BAM file). What I am aiming at is to get the normalised length of all genes in refseq or ensembl and bin them in say 10 bins and construct the profile, this will assist to visualise if the binding is on TSS, Genebody, TES etc.

Is their already a ready made solution in offering ...... by someone who has done this before?

As an example figure 2C

Thank you

chip-seq python perl r • 6.3k views

ADD COMMENT • link updated 11.4 years ago by Biostar 20 • written 11.8 years ago by Dataminer ★ 2.8k

score 3 · Answer 1 · 2013-02-15

3

Entering edit mode

11.8 years ago

Madelaine Gogol 5.3k

Not in the category of ready-made, but...

I have done this in R, but my code isn't really pretty. I'm sure there's better ways to do it in R (probably not involving slow for loops), but I'm still learning to use all these sequencing packages... Here's some crappy combination of R code and pseudocode as a start. Maybe someone else will be so annoyed by this that they post actual nice R code.

   #First, I don't actually know if I'm using all these packages, I just include them in case.
   library(GenomicRanges)
   library(rtracklayer)
   library(Rsamtools)

   #read in a bam file
   temp<-readGappedAlignments(bamfile)
   #summarize into a coverage Rle
   cvg <- coverage(temp)
   #may want to scale coverage here based on number of reads

   #then I loop through a bed file of genes pulling out the coverage values for each and sticking them in a data frame
   for each gene i
   {
      #use approx to create the bins, in this case 20... and save it into some data frame...
      df[i,]<-approx(window(cvg[[chrom]],start,end),n=20)$y
   }

Additionally you need to remember to use rev() on the coverage values for genes on the C/- strand... Also if you're adding bases up and downstream of a gene you need to check that you don't go past the start or end of the chromosome.

You may also take a look at the R package Repitools for a possibly more ready-made solution, but I haven't tried it...

ADD COMMENT • link 11.8 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

@Madelaine: could you please provide the complete code that you used, thank you.

ADD REPLY • link 11.8 years ago by Dataminer ★ 2.8k

1

Entering edit mode

Errr... This should be enough to understand how to basically set it up if you know R.

The code I summarized this from is doing a lot more involving summarizing up and downstream regions of genes in addition to the gene body, checking for overlapping genes, etc., so I feel it would be too confusing and messy to really be a big help in its entirety.

ADD REPLY • link 11.8 years ago by Madelaine Gogol 5.3k

0

Entering edit mode

I agree with Madelaine that it would probably clutter things up to add more. One comment though. She's using bioconductor so you would need to install that. If you are just starting R (and probably for a lot of people that know R but don't do bioinformatics with it), that might not be obvious. Here is how you install it, http://www.bioconductor.org/install/. All the packages she's using are on the bioconductor website.

ADD REPLY • link 11.8 years ago by KCC ★ 4.1k

0

Entering edit mode

I followed the link from seqanwser. actually I found ngsplot https://code.google.com/p/ngsplot/ can do exactly the same thing you (I) want to do.

ADD REPLY • link 11.4 years ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

Looking at this code, I was wondering if it is sensitive to genes on the forward vs. the reverse strand. One needs to reverse the order of traversal of the gene based on the direction in which the gene is transcribed to characterize the TSS and TES correctly.

ADD REPLY • link 11.4 years ago by KCC ★ 4.1k

0

Entering edit mode

Yep, that's why I said "Additionally you need to remember to use rev() on the coverage values for genes on the C/- strand... "

ADD REPLY • link 11.1 years ago by Madelaine Gogol 5.3k

score 2 · Answer 2 · 2013-02-15

2

Entering edit mode

11.8 years ago

Istvan Albert 101k

There are libraries to assist you in this, although each would probably need to be tuned to your problem set. For example in Python you can user the HTSeq package like so:

http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html#tss

ADD COMMENT • link 11.8 years ago by Istvan Albert 101k

score 2 · Answer 3 · 2013-02-15

Have you looked at CEAS from the Liu Lab, http://liulab.dfci.harvard.edu/CEAS/ ? It gives you information on how the profile of your signal is changing in the TSS, TTS and the Genebody. It can also do this for specific lists of genes.

Also, if you have a BED file then Cistrome (also from the Liu lab) can give you a profile of your reads going into the BED regions. It's as simple as entering your wig, your bed file and clicking for the algorithm to run. The output is a PDF file. I think you will have to sign up to use it though.

For both of these, you need to find a way to translate your bam into a wig I think.