Plotting Reads Around Tss

6

Entering edit mode

11.5 years ago

ChIP ▴ 600

Hi!

I am almost sure, guys who are actively involved in ChIP-Seq data analysis have plotted the mapped reads around the TSS in a certain window(say 10 Kb).

What I want to do is to plot reads of my histone marks (in bam file) around TSS with CpG and TSS without CpG (Essentially a coverage profile).

It would be very kind of you, if you could share the script that you had used for the same.

Kindly help.

chip-seq • 15k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 11.5 years ago by ChIP ▴ 600

0

Entering edit mode

I would not recommend to plot raw read counts directly from BAM as this may give you false impressions due to biases. For example, promoters by default have more reads do to GC and open chromatin bias. Thus, it is better to use log2 ratios or the difference between treatment and input.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.8 years ago by Fidel ★ 2.0k

5

Entering edit mode

11.5 years ago

Ryan Dale 5.0k

The Python package metaseq (docs) will do this directly from BAM files. Here's an example gist showing how you'd use it:

(edit: fix embedded gist)

Also, you may want to see this worked example from the docs: https://pythonhosted.org/metaseq/example_session.html

	# From http://www.biostars.org/p/83800/:

	# "What I want to do is to plot reads of my histone marks (in bam file)
	# around TSS with CpG and TSS without CpG (Essentially a coverage profile)."
	#
	#
	# To install metaseq and dependencies, see:
	#
	#
	# https://pythonhosted.org/metaseq/install.html
	#
	#
	# To download the example data used here, make sure you're in the directory
	# this script is saved in, and then use:
	#
	# git clone https://gist.github.com/a2e63a2fb93d05341de5.git demo_data
	#
	# (Or see https://gist.github.com/daler/a2e63a2fb93d05341de5 and download the
	# files individually)
	#


	import metaseq
	import pybedtools
	import numpy as np
	from matplotlib import pyplot as plt

	bam = metaseq.genomic_signal('demo_data/h3k4me3-chr21.bam', 'bam')
	cpg = pybedtools.BedTool('demo_data/cpg-chr21.bed.gz')
	tss = pybedtools.BedTool('demo_data/tss-chr21.bed.gz')

	# extend by 5 kb up/downstream
	tss = tss.slop(b=5000, genome='hg19')

	tss_with_cpg = tss.intersect(cpg, u=True)
	tss_without_cpg = tss.intersect(cpg, v=True)

	# change this to as many CPUs as you have in order to run in parallel
	processes = 1

	# each read will be extended 3' to a total size of this many bp
	fragment_size = 200

	# the region +/-5kb around each TSS will be split into a total of 100 bins,
	# change as needed
	bins = 100

	x = np.linspace(-5000, 5000, bins)

	# most of the work happens here
	y1 = bam.array(tss_with_cpg, bins=bins, processes=processes, fragment_size=fragment_size)
	y2 = bam.array(tss_without_cpg, bins=bins, processes=processes, fragment_size=fragment_size)

	plt.rcParams['font.size'] = 11
	plt.rcParams['font.family'] = 'Arial'

	plt.plot(x, y1.mean(axis=0), label='with cpg', color='k')
	plt.plot(x, y2.mean(axis=0), label='without cpg', color='r', linestyle='--')
	plt.legend(loc='best')
	plt.xlabel('Distance from TSS (bp)')
	plt.ylabel('Mean H3K4me3 read density')
	plt.show()

view raw metaseq_demo.py hosted with ❤ by GitHub

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 11.5 years ago by Ryan Dale 5.0k

0

Entering edit mode

Thank you for the script. Could you please, tell me how to install this package metaseq.

ADD REPLY • link 11.5 years ago by ChIP ▴ 600

0

Entering edit mode

I updated the script to have some installation instructions. It depends on the scientific Python stack, which can be troublesome to install -- see http://www.scipy.org/install.html if you run into issues.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 11.5 years ago by Ryan Dale 5.0k

0

Entering edit mode

Hi! I have a unique error with this line

tss = tss.window(w=5000)

and as soon as I mark this line as commented, the whole script works fine.

The error says

Command was:

bedtools window -a tss.bed -w 5000

Error message was:

*****ERROR: Need -a and -b files.

I think, the command should be:

tss = tss.window(cpg,w=5000)

Could you be kind enough to suggest an edit in your script.

Thank you

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 11.5 years ago by ChIP ▴ 600

0

Entering edit mode

Sorry, I had used the wrong BEDTools program -- it should be "slop" instead of "window". The intent was to extend single-bp TSS feature out to a total size of 10kb. Thanks for catching this; I edited the script.

ADD REPLY • link 11.5 years ago by Ryan Dale 5.0k

0

Entering edit mode

Hi. I am a bit of a newbie, so I apologise in advance, however I haven't found any answers elsewhere. I have tried to run the script above, but when I do

>>> y1 = bam.array(tss_with_cpg, bins=bins, processes=processes, fragment_size=fragment_size)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/metaseq/_genomic_signal.py", line 122, in array
    chunksize=chunksize, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/metaseq/array_helpers.py", line 382, in _array_parallel
    itertools.repeat(kwargs)))
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
TypeError: 'NoneType' object is not iterable

I have tried with several bam files. Why is the bam metaseq._genomic_signal.BamSignal object not iterable? Or have I missed something?

Thank you very much in advance.

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.6 years ago by adira.mollari • 0

1

Entering edit mode

Lots of improvements have been made to metaseq since this was first posted, and the git branch referenced in the script is no longer valid.

I've just updated the script with 1) a reference to the latest installation instructions and 2) instructions for getting example data, so you don't need your own data to run this. Please report back (either here or on the github issues page) if this still doesn't work for you.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Ryan Dale 5.0k

0

Entering edit mode

Will definitely try this now, works directly from BAM!

BTW, if I want to subtract the forward-reverse reads on BAM, especially for histone Chip-seq, how do I get the subtracted read from BAM. Then this subtracted BAM file could be used for plotting.

ADD REPLY • link updated 5.3 years ago by Ram 45k • written 10.6 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

Not sure what you mean by "subtract the forward-reverse reads". But you can manipulate your BAM using other means (e.g., samtools) and then read the new BAM with metaseq as in the example.

Also, see the BamSignal.local_coverage docs for other arguments that might be helpful for you -- specifically fragment_size, shift_width, read_strand, and stranded.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Ryan Dale 5.0k

1

Entering edit mode

11.5 years ago

Chirag Nepal ★ 2.4k

You can have a look here http://crazyhottommy.blogspot.no/2013/04/how-to-make-tss-plot-using-rna-seq-and.html

cheers

ADD COMMENT • link 11.5 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

I essentially learned that from the HTSeq package document http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html#tss note that, the y axis is not normalized to count per million (cpm), for that purpose, one needs to count the total reads number (you can do it with HTSeq,but it is rather slow, samtools will be better), and divide the numpy array by it.

ADD REPLY • link 11.4 years ago by Ming Tommy Tang ★ 4.6k

1

Entering edit mode

11.5 years ago

KCC ★ 4.1k

Assuming you have your TSS and CpG files in BED format. You can use either a command from bedtools like intersectBED to get TSS which have CpG regions and ones that don't. Then you use an online tool like Cistrome or CEAS to get a profile across your BED regions.

Sorry, I can't be more specific as you haven't mentioned any of the file formats that you are using. I can adjust my answer if you do.

ADD COMMENT • link 11.5 years ago by KCC ★ 4.1k

0

Entering edit mode

It is a BAM file.

ADD REPLY • link 11.5 years ago by ChIP ▴ 600

0

Entering edit mode

Seqminer should do what you are looking for

ADD REPLY • link 11.5 years ago by kanwarjag ★ 1.2k

0

Entering edit mode

11.4 years ago

Ian 6.1k

I recently discovered NGS PLOT on biostars. It is R-based and can work from the command line or via GALAXY. https://code.google.com/p/ngsplot/

ADD COMMENT • link 11.4 years ago by Ian 6.1k

0

Entering edit mode

11.4 years ago

bede.portz ▴ 540

I think HOMER may be able to do what you are asking.

HOMER manual for annotatePeaks.pl

If you carry out the detailed annotation, the output should contain information about the distance of peaks to many known genomic features, including CpG islands. I think you could parse the information you want from this file.

I provided a link to the manual for the command that may give you what you need so you can try to ascertain whether or not HOMER can provide the functionality you desire. I would suggest reading the preceding parts of the manual in its entirety, as HOMER will do some things like normalization that you may or may not want, and there are preceding steps you will need to carry out before using annotatePeaks.pl including downloaded the version of your genome of interest via HOMER, making tag directories, etc.

I wish I could provide more detailed and expert advice, but I am new to this analysis myself.

Someone suggested SeqMiner. My concern with SeqMiner is the poor documentation. It can generate heatmaps from data VERY quickly, which is attractive, but seems to be more of a black box than HOMER. I think SeqMiner give you less control over the analysis, and/or the control you do have is less intuitive, at least to me.

ADD COMMENT • link 11.4 years ago by bede.portz ▴ 540

0

Entering edit mode

9.8 years ago

Ram ▴ 190

Dear Ryan,

Is it possible to know by using this script why there is shift of +2kb for particular histone mark but rather it has to be on TSS ?

Thanks a lot.

ADD COMMENT • link 9.8 years ago by Ram ▴ 190

Login before adding your answer.