How To Plot Read Coverage Over Many Different Genomic Regions?
4
4
Entering edit mode
11.8 years ago
Ian 6.1k

I often have ChIP-seq experiments where I want to get a feel for read coverage over the predicted binding regions. Looking through the UCSC genome browser region by region is slow and labourious. What i am looking for is a way of plotting read coverage (from bedGraph, Wig etc) for many different binding regions and present many of these plots on one page.

I envisage the input would be a bedGraph/Wig file and a list of binding region coordinates. I am aware of a previous Batch Viewing Of Ucsc Browser Graphic that kind of covers this, but it uses UCSC/IGV. I am looking for something much more simplistic - just a line graph per region. Even more, it would be great to be able to plot ChIP and input read coverage on the same graph.

I wonder whether some Python guru etc. has already tackled this?

Many thanks!

coverage chip-seq • 12k views
ADD COMMENT
7
Entering edit mode
10.7 years ago
Fidel ★ 2.0k

I will like to update my previous answer:

We have made available a suite of tools called deepTools to make this sort of visualizations very easy. There is a tool called profiler that will plot exactly what you want. You provide a list of regions (BED or GFF format) and a bigWig file and the output is a profile. Here you can find the documentation on how to run the tool. You can easily convert any wig/bedgraph file to a bigWig using the tools from UCSC that can dowloaded here: http://hgdownload.cse.ucsc.edu/admin/exe/

The tool uses bigWig files to parallelize computations.

enter image description here

ADD COMMENT
5
Entering edit mode
11.8 years ago
Irsan ★ 7.8k

Invest some time to learn the R-package ggplot2. I made an example for you below:

# create copy number data for 4 samples for 3 genes 
data<-data.frame(
    gene=sample(c("geneA","geneB","geneC"),10000,replace=TRUE),
    sampleName=sample(c("sample1","sample2","sample3","sample4"),10000,replace=TRUE),
    position=round(runif(n=10000,min=0,max=5000),digits=0),
    copy_number=rnorm(n=10000,mean=2,sd=0.1)
)

# create copy number gain in sample 4 gene A
data[data$gene=="geneA" & data$sampleName=="sample1",4]<-rnorm(
    mean=3,
    sd=0.1,
    n=length(
        data[data$gene=="geneA" & data$sampleName=="sample1",4]
    )
)

# and a focal loss at sample 3, gene B
data[data$gene=="geneB" & data$sampleName=="sample3" & data$position > 2000 & data$position < 4000,4]<-rnorm(
    mean=1,
    sd=0.1,
    n=length(
        data[data$gene=="geneB" & data$sampleName=="sample3" & data$position > 2000 & data$position < 4000,4]
    )
)

# plot copy numbers for all genes for all samples
library(ggplot2)
plot<-ggplot(data) + #add basic layer
geom_point(aes(x=position,y=copy_number)) + #add copy number points to plot
facet_wrap(gene ~ sampleName) + # seperate the CN numbers for each gene and for each sample
scale_y_continuous(limits=c(0,4)) # set y-limits

# and save plot to file
png("copy_numbers.png",1000,1000)
print(plot)
dev.off()
ADD COMMENT
1
Entering edit mode
11.8 years ago
Ian 6.1k

The fruit of my own searching, in parallel to asking this question is a tool called Dpeaks.

It is not exactly what i am after, but it produces plots (in various formats, drawn by R), based on user supplied variable-WIG file(s).

There is one Perl script to run the process. Another Perl script all the conversion of fixed-WIG to variable-WIG. By trial and error i found you can also start of with a BedGraph file and convert it, using BedToWig.sh to a variable-WIG file, see HERE for more information.

BTW MACS output the required variable-WIG format.

ADD COMMENT
0
Entering edit mode
11.8 years ago
Fidel ★ 2.0k

What we do here is to plot a heatmap where rows represents each regions and each column is a bin within the regions. This is very common method that you often see in the the literature.

Normally we use the ratio between the ChIP-seq data and the input, otherwise you may see the biases instead of the signal.

We use our own software for this but I will be happy to share it if you are interested.

ADD COMMENT
0
Entering edit mode

Thank you for you answer, but i really want to see the profile (peak shape) of the IP relative to the input.

ADD REPLY
1
Entering edit mode

Then go for the ggplot2 solution and change

geom_point(aes(x=position,y=copy_number))

to

geom_line(aes(x=position,y=copy_number))
ADD REPLY
0
Entering edit mode

I forgot to mention that instead of a heatmap, it is also posible to plot the the profile as the average over all regions for each bin.

ADD REPLY

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6