How to plot ChIP-seq Density vs Distance from TSS using Homer annoted files
0
2
Entering edit mode
8.3 years ago
varsha619 ▴ 90

Hi, Sorry in advance if my question is too basic. I am new to ChIP-seq analysis and I have a list of .anno files from Homer annotatepeaks.pl (which annotates .bed files). I would like to plot a Density vs Distance from TSS graph using Homer .anno files. I saw different packages available (such as R ChIPseeker) that use .bed files as input to generate the same, after annotating the peaks as well. I was wondering if there is a package that uses the already annotated Homer output .anno files as input to generate a similar plot. Please let me know, thank you for your help!

ChIP-Seq homer • 9.0k views
ADD COMMENT
1
Entering edit mode

You're looking for a density plot, or metagene where the x axis is 'region distance' and y axis is 'chip-seq signal' correct?

If this is the case then HOMER can do this fairly easily. You'll need a bed file of your regions of interest, and a bedGraph or wig file of your ChIP-seq data. Then you can run something similar to:

annotatePeaks.pl $REGIONSBED $GENOME -size $REGIONDISTANCE -hist 50 -bedGraph $BEDGRAPHOFCHIPSEQ > $OUTPUTFILENAME

Where $REGIONSBED = the bed file of your regions of interest, $GENOME = the genome of your ChIP-seq data such as hg19, or mm9, etc, $REGIONDISTANCE = the number of basepairs the metagene plot should cover typically for TSS, 6000 is fine for hg19 (HOMER takes the number you input here and divides it by 2 and then spans that so for 6000 it would be 3000 +/- bp in either direction from the central point), and $BEDGRAPHOFCHIPSEQ = a bedgraph file of your chip-seq data.

Once you have the output file, you can plot the coverage column in Excel.

ADD REPLY
0
Entering edit mode

Thank you Sinji, just to confirm is the -bedGraph $BEDGRAPHOFCHIPSEQ generated in this command here or is it to be generated elsewhere and given as input here along with the .bed file?

ADD REPLY
0
Entering edit mode

Yes, you can generate this using any number of tools. Bedtools, and deeptools are two easy to use programs that can do this for you.

ADD REPLY
0
Entering edit mode

Hi, I used the MACS -B option to generate the bedGraphs = macs14 -t 1.sam -c 2.sam -f SAM -g dm -n out -B. This gave me a bedGraph output folder which I used for annotatePeaks.pl which for some reason gave me all the coverage values as 0. Would you please help me fix this? I appreciate your help.

ADD REPLY
0
Entering edit mode

Can you do a head of your bedgraph file?

ADD REPLY
0
Entering edit mode

This is what the 'head' looks like -

track type=bedGraph name="NT60_treat_chr2L" description="Extended tag pileup from MACS version 1.4.3 20131216" chr2L 288 351 1 chr2L 351 367 2 chr2L 367 387 1

ADD REPLY
0
Entering edit mode

Try head -n 20 ... I want to see what your chr notation looks like.

Otherwise, it might be all that metadata that's messing up HOMER. You could try cutting out all the information before you get to your typical bedgraph coverage information:

chr1    10000    10500

and see if that works, if that makes any sense.

ADD REPLY
0
Entering edit mode

Hi, I'm sorry may be this is an long time ago issue.

But can I get some advice for making this plot from the center of the peaks? For example, the example you said before is like -3000bp to 3000bp. When I plot this values, can I make the 0bp point of the plot being the center of the peaks?

It'll be very grateful to have some advice. Thanks a lot

Woongjae

ADD REPLY
0
Entering edit mode

Hi @Woongjae, in the Homer command by @Sinji if you use -size 6000, you would get -3000 to 3000bp plots. annotatePeaks.pl $REGIONSBED $GENOME -size $REGIONDISTANCE -hist 50 -bedGraph $BEDGRAPHOFCHIPSEQ > $OUTPUTFILENAME

As the thread mentions, the bed and bedGraph files can be generated by various softwares, I typically use MACS14 -B option. This gives output .bdg files in .gz format for each chr with a header, I used the linux command - sed '1d' to remove the header line and concatenate all the chr files into 1 .bdg file.

ADD REPLY
0
Entering edit mode

Hi varsha! Thank you for the reply. I get what you mean. Thank you very much. What I'm curios about is, that if I get the histogram values by homer, do the peaks' summit position(highest tag density of each peaks) go to the center of the -size? For example if the range I got was -3000~3000bp, are the positions of each peaks are in the 0bp? Maybe because of my poor English, I'm not sure you could get the point...Sorry for that.

Thank you very much!

ADD REPLY
1
Entering edit mode

To make the plot the center of a peak you have to manually adjust your peak file so that the start and end position are both the middle of the peak.

Something like awk -v OFS='\t' '{print $1, ($2+$3)/2, ($2+$3)/2, $4, $5, $6}' $PEAKFILE > $OUTPUT ... this is assuming your peak file is a standard BED format.

Original PEAKFILE:

chr1 100 200 peak1 0 .

chr1 300 500 peak2 0 .

Edited PEAKFILE:

chr1 150 150 peak1 0 .

chr1 400 400 peak2 0 .
ADD REPLY
0
Entering edit mode

Thank you Sinji!

Your comments really helped me!

ADD REPLY
0
Entering edit mode

This is the head output - track type=bedGraph name="NT60_treat_chr2L" description="Extended tag pileup from MACS version 1.4.3 20131216"

chr2L 288 351 1

chr2L 351 367 2

chr2L 367 387 1

chr2L 387 430 2

chr2L 430 466 1

chr2L 809 845 1

chr2L 845 888 2

Another thing I noticed was that the bedGraph input I had tried giving Homer was the whole bedGraph output folder from MACS (since there were multiple .gz output files in the folder for each chromosome). Could this be the issue? Please let me know what you think, thanks!

ADD REPLY
1
Entering edit mode

Yes that could be a issue, your best bet might be to remove the first track line of every file, cat them together, and then unzip them and feed that file into HOMER. It requires a very specific format, which is unfortunate, but necessary I suppose.

ADD REPLY
0
Entering edit mode

Oops sorry I forgot to "Add reply", I will try that suggestion. Thank you so much!

ADD REPLY
0
Entering edit mode

This worked! Thanks..

ADD REPLY
0
Entering edit mode

Also, how do I find out if the coverage value in the output is per bp or million bp?

ADD REPLY
0
Entering edit mode

This depends on your bedgraph input. If you input a bedgraph file that has been normalized to million bp then your coverages are normalized as well. If you have MACS2 output, I think it has a parameter you can set to output normalized bedGraphs, but you'd have to double check your command and the MACS2 manual.

ADD REPLY
0
Entering edit mode

@Sinji Can you please help me with this question. Thanks

ADD REPLY
0
Entering edit mode

Please use ADD REPLY when responding to existing posts.

ADD REPLY

Login before adding your answer.

Traffic: 2326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6