Plotting Reads Around Tss
6
6
Entering edit mode
11.1 years ago
ChIP ▴ 600

Hi!

I am almost sure, guys who are actively involved in ChIP-Seq data analysis have plotted the mapped reads around the TSS in a certain window(say 10 Kb).

What I want to do is to plot reads of my histone marks (in bam file) around TSS with CpG and TSS without CpG (Essentially a coverage profile).

It would be very kind of you, if you could share the script that you had used for the same.

Kindly help.

chip-seq • 15k views
ADD COMMENT
0
Entering edit mode

I would not recommend to plot raw read counts directly from BAM as this may give you false impressions due to biases. For example, promoters by default have more reads do to GC and open chromatin bias. Thus, it is better to use log2 ratios or the difference between treatment and input.

ADD REPLY
5
Entering edit mode
11.1 years ago
Ryan Dale 5.0k

The Python package metaseq (docs) will do this directly from BAM files. Here's an example gist showing how you'd use it:

(edit: fix embedded gist)

Also, you may want to see this worked example from the docs: https://pythonhosted.org/metaseq/example_session.html

ADD COMMENT
0
Entering edit mode

Thank you for the script. Could you please, tell me how to install this package metaseq.

ADD REPLY
0
Entering edit mode

I updated the script to have some installation instructions. It depends on the scientific Python stack, which can be troublesome to install -- see http://www.scipy.org/install.html if you run into issues.

ADD REPLY
0
Entering edit mode

Hi! I have a unique error with this line

tss = tss.window(w=5000)

and as soon as I mark this line as commented, the whole script works fine.

The error says

Command was:

bedtools window -a tss.bed -w 5000

Error message was:

*****ERROR: Need -a and -b files.

I think, the command should be:

tss = tss.window(cpg,w=5000)

Could you be kind enough to suggest an edit in your script.

Thank you

ADD REPLY
0
Entering edit mode

Sorry, I had used the wrong BEDTools program -- it should be "slop" instead of "window". The intent was to extend single-bp TSS feature out to a total size of 10kb. Thanks for catching this; I edited the script.

ADD REPLY
0
Entering edit mode

Hi. I am a bit of a newbie, so I apologise in advance, however I haven't found any answers elsewhere. I have tried to run the script above, but when I do

>>> y1 = bam.array(tss_with_cpg, bins=bins, processes=processes, fragment_size=fragment_size)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/metaseq/_genomic_signal.py", line 122, in array
    chunksize=chunksize, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/metaseq/array_helpers.py", line 382, in _array_parallel
    itertools.repeat(kwargs)))
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
TypeError: 'NoneType' object is not iterable

I have tried with several bam files. Why is the bam metaseq._genomic_signal.BamSignal object not iterable? Or have I missed something?

Thank you very much in advance.

ADD REPLY
1
Entering edit mode

Lots of improvements have been made to metaseq since this was first posted, and the git branch referenced in the script is no longer valid.

I've just updated the script with 1) a reference to the latest installation instructions and 2) instructions for getting example data, so you don't need your own data to run this. Please report back (either here or on the github issues page) if this still doesn't work for you.

ADD REPLY
0
Entering edit mode

Will definitely try this now, works directly from BAM!

BTW, if I want to subtract the forward-reverse reads on BAM, especially for histone Chip-seq, how do I get the subtracted read from BAM. Then this subtracted BAM file could be used for plotting.

ADD REPLY
0
Entering edit mode

Not sure what you mean by "subtract the forward-reverse reads". But you can manipulate your BAM using other means (e.g., samtools) and then read the new BAM with metaseq as in the example.

Also, see the BamSignal.local_coverage docs for other arguments that might be helpful for you -- specifically fragment_size, shift_width, read_strand, and stranded.

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

I essentially learned that from the HTSeq package document http://www-huber.embl.de/users/anders/HTSeq/doc/tss.html#tss note that, the y axis is not normalized to count per million (cpm), for that purpose, one needs to count the total reads number (you can do it with HTSeq,but it is rather slow, samtools will be better), and divide the numpy array by it.

ADD REPLY
1
Entering edit mode
11.1 years ago
KCC ★ 4.1k

Assuming you have your TSS and CpG files in BED format. You can use either a command from bedtools like intersectBED to get TSS which have CpG regions and ones that don't. Then you use an online tool like Cistrome or CEAS to get a profile across your BED regions.

Sorry, I can't be more specific as you haven't mentioned any of the file formats that you are using. I can adjust my answer if you do.

ADD COMMENT
0
Entering edit mode

It is a BAM file.

ADD REPLY
0
Entering edit mode

Seqminer should do what you are looking for

ADD REPLY
0
Entering edit mode
11.1 years ago
Ian 6.1k

I recently discovered NGS PLOT on biostars. It is R-based and can work from the command line or via GALAXY. https://code.google.com/p/ngsplot/

ADD COMMENT
0
Entering edit mode
11.1 years ago
bede.portz ▴ 540

I think HOMER may be able to do what you are asking.

HOMER manual for annotatePeaks.pl

If you carry out the detailed annotation, the output should contain information about the distance of peaks to many known genomic features, including CpG islands. I think you could parse the information you want from this file.

I provided a link to the manual for the command that may give you what you need so you can try to ascertain whether or not HOMER can provide the functionality you desire. I would suggest reading the preceding parts of the manual in its entirety, as HOMER will do some things like normalization that you may or may not want, and there are preceding steps you will need to carry out before using annotatePeaks.pl including downloaded the version of your genome of interest via HOMER, making tag directories, etc.

I wish I could provide more detailed and expert advice, but I am new to this analysis myself.

Someone suggested SeqMiner. My concern with SeqMiner is the poor documentation. It can generate heatmaps from data VERY quickly, which is attractive, but seems to be more of a black box than HOMER. I think SeqMiner give you less control over the analysis, and/or the control you do have is less intuitive, at least to me.

ADD COMMENT
0
Entering edit mode
9.5 years ago
Ram ▴ 190

Dear Ryan,

Is it possible to know by using this script why there is shift of +2kb for particular histone mark but rather it has to be on TSS ?

Thanks a lot.

ADD COMMENT

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6