Question

HOMER annotatePeaks.pl problem

2

Entering edit mode

7.2 years ago

Seq225 ▴ 110

I am using HOMER to get read mapping around TE (well, I need to make a heatmap). I am using following command:

annotatePeaks.pl  XXX.bed  XXX(genome)   -bedGraph XXX.bedgraph -size 200 -hist 2 -ghist  -noadj -fragLength 0   > XXX.txt

Before running this command, I converted my bed file using changeNewLine.pl and configured my genome in HOMER. I have been using this tool and so this command for couple of weeks now. From terminal result it seems ok. Here is the result in terminal:

Peak file = XXX.bed
Genome = XXX
Organism = null
bedGraph Files:
XXX.bedgraph
Peak Region set to 200
-----------------------------------------------------
Histogram mode activated (bin size = 2 bp)
-----------------------------------------------------
Will create histogram for each gene
Will NOT normalize tag counts
Fragment Length set to 0
Peak/BED file conversion summary:
BED/Header formatted lines: 4203
peakfile formatted lines: 0
Duplicated Peak IDs: 0

Peak File Statistics:
Total Peaks: 4203
Redundant Peak IDs: 0
Peaks lacking information: 0 (need at least 5 columns per peak)
Peaks with misformatted coordinates: 0 (should be integer)
Peaks with misformatted strand: 0 (should be either +/- or 0/1)

Peak file looks good!

Resizing peaks...
Reading Positions...
-----------------------
Compiling per bp Histograms...
Finding Tags in Peaks from each directory...

Order of experiments in output file:
XXX.bedgraph

But, the number of rows in my output file and number in input bed file are not same. Number in output file is much less. Would you please tell me what went wrong and how can I get all rows in my output file?

Thank you very much and I am very sorry to bother.

RNA-Seq ChIP-Seq HOMER • 11k views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 7.2 years ago by Seq225 ▴ 110

0

Entering edit mode

ChIPseeker R package is a good to option to analyze chipseq

ADD REPLY • link 7.2 years ago by tarek.mohamed ▴ 370

0

Entering edit mode

I have done it!! I used deeptools in galaxy and options were quite simple there.

Thank you very very much for suggesting deepTools!!

ADD REPLY • link 7.2 years ago by Seq225 ▴ 110

0

Entering edit mode

Great news! I also eventually migrated from using HOMER to deepTools.

Best of luck.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

I also meet that problem. But with the different thing is I use tag file to run it. Indeed it will successfully run but got an empty table of the histogram. So I'm so confused about Homer tools Here's the error message

Using Custom Genome
Peak file = XXX.Bed
Genome = hg38.fa.gz
Organism = unknown
Peak Region set to 6000
-----------------------------------------------------
Histogram mode activated (bin size = 10 bp)
-----------------------------------------------------
Tag Directories:
    /XXX_Homer_tag/
    /CCC_Homer_tag/
Peak/BED file conversion summary:
    BED/Header formatted lines: 1674
    peakfile formatted lines: 0
    Duplicated Peak IDs: 0

Peak File Statistics:
    Total Peaks: 1674
    Redundant Peak IDs: 0
    Peaks lacking information: 0 (need at least 5 columns per peak)
    Peaks with misformatted coordinates: 0 (should be integer)
    Peaks with misformatted strand: 0 (should be either +/- or 0/1)

Peak file looks good!

Resizing peaks...
Reading Positions...
-----------------------
Compiling per bp Histograms...
Finding Tags in Peaks from each directory...
Ratio for /XXX_Homer_tag/ : 0.296882302376881
Ratio for /CCC_Homer_tag/ : 0.185192555162982
Processing tags from /XXX_Homer_tag/
Processing tags from /CCC_Homer_tag/

ADD REPLY • link 4.2 years ago by jimmy0958073736 ▴ 40

0

Entering edit mode

Empty table
-3000   0   0   0   0   0   0
-2990   0   0   0   0   0   0
-2980   0   0   0   0   0   0
-2970   0   0   0   0   0   0
-2960   0   0   0   0   0   0
-2950   0   0   0   0   0   0
-2940   0   0   0   0   0   0
-2930   0   0   0   0   0   0
-2920   0   0   0   0   0   0
-2910   0   0   0   0   0   0
-2900   0   0   0   0   0   0
-2890   0   0   0   0   0   0
-2880   0   0   0   0   0   0
-2870   0   0   0   0   0   0
-2860   0   0   0   0   0   0
-2850   0   0   0   0   0   0
-2840   0   0   0   0   0   0
-2830   0   0   0   0   0   0
-2820   0   0   0   0   0   0
-2810   0   0   0   0   0   0
-2800   0   0   0   0   0   0
-2790   0   0   0   0   0   0
-2780   0   0   0   0   0   0
-2770   0   0   0   0   0   0
-2760   0   0   0   0   0   0
-2750   0   0   0   0   0   0
-2740   0   0   0   0   0   0
-2730   0   0   0   0   0   0
-2720   0   0   0   0   0   0
-2710   0   0   0   0   0   0
-2700   0   0   0   0   0   0
-2690   0   0   0   0   0   0
-2680   0   0   0   0   0   0
-2670   0   0   0   0   0   0
-2660   0   0   0   0   0   0
-2650   0   0   0   0   0   0
-2640   0   0   0   0   0   0
-2630   0   0   0   0   0   0
-2620   0   0   0   0   0   0
-2610   0   0   0   0   0   0
-2600   0   0   0   0   0   0
-2590   0   0   0   0   0   0
-2580   0   0   0   0   0   0
-2570   0   0   0   0   0   0
-2560   0   0   0   0   0   0
-2550   0   0   0   0   0   0
-2540   0   0   0   0   0   0
-2530   0   0   0   0   0   0
-2520   0   0   0   0   0   0
-2510   0   0   0   0   0   0

ADD REPLY • link updated 18 months ago by Ram 44k • written 4.2 years ago by jimmy0958073736 ▴ 40

0

Entering edit mode

I got the same problem. And this ended up with a disk space issue in my case. I don't have enough data storage...

ADD REPLY • link 18 months ago by Zhiguang • 0

score 6 · Accepted Answer · 2017-09-27

6

Entering edit mode

7.2 years ago

Kevin Blighe 88k

If you just want to get gene annotation for your BED regions, you only need to use annotatePeaks like this:

perl annotatePeaks.pl MyBED.bed mm9 -annStats AnnotationStats.txt > Annotation.tsv

This always produces an equal number of annotated regions as per my BED file. The correct genome has to be used of course.

HOMER will assume that the BED file regions define your peaks. I don't believe that you have t supply all of the other information such as the bedGraph, peak size, etc.

ADD COMMENT • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks. But I want to make an heatmap. That's the actual purpose. For heatmap -hist and -ghist are necessary. -size 200 is for defining the regions that I did data on.....

I am not sure how the genome file is working here. The bed file should work on the bedgraph file (which is the actual mapped file, which I created by bowtie2, samtools, and bedtools genomecov). bed file has the coordinates around which (defined by -size 200) I need the mapping numbers (probably defined by tags in the main homer manual).

It is highly likely though that I have misunderstood some options completely!!

ADD REPLY • link 7.2 years ago by Seq225 ▴ 110

1

Entering edit mode

Ah, I see. I also used annotatePeaks when I was generating a heatmap, however, I used cluster3 to build the actual heamap based on annotatePeak's output. Here were my commands:

perl annotatePeaks.pl FinalPeaks.txt mm9 -size 2000000 -hist 5000 -norm 1000000 -ghist -d MyEmptyVectorTagDirectory/ MyPositiveControlTagDirectory/ MyTargetMarkerTagDirectory/ > ForHeamap.Blueprint.txt
cluster3 -f ForHeatmap.Blueprint.txt -g 2 -m c
cluster3 -f ForHeatmap.Blueprint.txt -k 10

I believe that FinalPeaks.txt is just the standard peaks file produced by HOMER, but cannot recall exactly.

Also, there was another recent question specifically about visualisation of ChIP-seq data, and in my answer I mentioned how heatmaps can also be done with deepTools: A: Visualization for ChIP-seq analysis

PS - my marker of interest covered very large regions, but your marker is obviously binding to more narrow regions

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

1

Entering edit mode

I will try deepTools for sure.

However, I am also using cluster3. I am using the homer output file annotatePeaks.pl XXX.bed XXX(genome) -bedGraph XXX.bedgraph -size 200 -hist 2 -ghist -noadj -fragLength 0 > XXX.txt) XXX.txt in cluster3 to cluster them, which produce a .cdt file. I use the .cdt file for making heatmap in Java Treeview.

Cluster3 and Treeview are working fine. My problem is the XXX.txt file that homer gives me. It does not have all the entries of the bed file.

I will try your command to use homer and cluster3 together.

ADD REPLY • link 7.2 years ago by Seq225 ▴ 110

0

Entering edit mode

Yes, we are using the function slightly differently: I just supply a single peaks file that contains my regions of interest (with first 3 columns in BED format), and then also a few tags directories for my samples. The function then generates the histogram information over my peaks using the tags information for each sample.

Please let me know if you have any success! I do recall having to work through trial and error generally with HOMER's functions. However, the images of Homer Simpson and the doughnut on the website made it easier.

Edit: the first thing that I suggest you change is -fragLength 0 to -fragLength auto, but perhaps you have already tried this

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Also, as you've supplied a bedGraph file and not a tag directory, I believe that HOMER will look for 'tags' (reads) in your bedGraph file for the regions defined in your input BED. If there are no tags, or the regions in both the BED and bedGraph don't overlap, then by all means some regions won't appear in the final output.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

change -fragLength 0 to -fragLength auto does not make any difference. Your point about bedgraph and bed overlap is what I thought initially. But I want all rows. Moreover, I checked the output file, there are actually some rows that do not have any tags........I don't know what is actually happening.

ADD REPLY • link 7.2 years ago by Seq225 ▴ 110

0

Entering edit mode

Okay, apologies that they did not work.

Did you try using the tag directories like I did (instead of supplying the bedGraph)? I just checked my data and I have the same number of regions/rows in my input file as I do in the file then going into cluster3/Treeview

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

1

Entering edit mode

Please don't apologize, I guess you are directing me to the right pipeline. I am working on the tagdirectory option. However, I am getting different results, and also the number of rows is still different.

Here is what I used: makeTagDirectory tagdir -keepAll -single -flip a.bam

I need to figure out the right command for making the tag directory. My data is actually small-RNA seq data produced in Illumina platform in un-paired fashion. I am still reading the manual to get the right command!!

Thanks a lot for your input!!!!

ADD REPLY • link 7.2 years ago by Seq225 ▴ 110

1

Entering edit mode

Okay, here's one final throw of the dice! Here are all of my commands that led up to the cluster3 step. I was using the mm9 genome, and in this example I allude to just 2 sample tag directories: MySampleTags/ and MyInputControlTags/

#Alignment
bowtie2 -p 7 -x mm9 -U MySample.fastq.gz > MySample.sam

#Remove PCR duplicates
samtools view -bS MySample.sam > MySample.bam
samtools sort MySample.bam MySample_Sorted
java -jar picard.jar MarkDuplicates INPUT=MySample_Sorted.bam OUTPUT=MySample_Sorted_PCRDuped.bam ASSUME_SORTED=true METRICS_FILE=MySample_Sorted_PCRDuped.txt VALIDATION_STRINGENCY=SILENT
samtools view -b -F 0x400 MySample_Sorted_PCRDuped.bam > MySample_Sorted_PCRDupesRemoved.bam
samtools index MySample_Sorted_PCRDupesRemoved.bam
samtools view -h MySample_Sorted_PCRDupesRemoved.bam > MySample_Sorted_PCRDupesRemoved.sam

#Make tag directory
makeTagDirectory MySampleTags/ MySample_Sorted_PCRDupesRemoved.sam -unique -mapq 30 -genome mm9 -checkGC -normGC default -precision 3

#Remove out of bounds reads (optional)
perl removeOutOfBoundsReads.pl MySampleTags/ mm9

#Identify peaks
findPeaks MySampleTags/ -i MyInputControlTags/ -gsize 3500000000 -style histone -size 1500 -minDist 15000 -region -nfr -fdr 0.00001 -F 2 -P 0.0001 -o MySampleTags/MySample_FinalPeaks.txt

#Annotate the peaks
perl annotatePeaks.pl MySampleTags/MySample_FinalPeaks.txt mm9 -annStats MySampleTags/MySample_FinalPeaks_AnnotStats.txt > MySampleTags/MySample_FinalPeaks_Annot.txt

#Plotting tag densities around defined peaks
perl annotatePeaks.pl MySampleTags/MySample_FinalPeaks.txt mm9 -size 15000 -hist 100 -norm 1000000 -d MySampleTags/ MyInputControlTags/ > PeaksDensities.txt

#Plot a 'blueprint' of the data, then cluster using Cluster 3.0 (sudo apt-get install cluster3) with Spearman rank correlation values and centroid linkage, and finally view in TreeView
perl annotatePeaks.pl MySampleTags/MySample_FinalPeaks.txt mm9 -size 2000000 -hist 5000 -norm 1000000 -ghist -d MySampleTags/ MyInputControlTags/ > HeatmapBlueprint.txt
cluster3 -f HeatmapBlueprint.txt -g 2 -m c
cluster3 -f HeatmapBlueprint.txt -k 10

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks a ton!

I will try your commands. I am also working on DeepTools. Hopefully, either one will work. Cheers!!

ADD REPLY • link 7.2 years ago by Seq225 ▴ 110

1

Entering edit mode

No problem. I really like Homer but I also found it difficult to follow and piece together a working pipeline. I spent many long days skipping lunch with my head buried into my laptop.

Good luck!

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

@Kevin how do i see the occupancy of a gene with its target genes from this homer analysis ,idea is to see the occupancy of a gene to its target genes upstream and downstream of the TSS like 500 or 1000 bp

can you give me some direction ? how do i proceed after doing the peak annotation

ADD REPLY • link 5.7 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

Hey krushnach, what was your ChIP-seq target? - a transcription factor? If you have your peaksalready identified, then you could literally just add 500 or 1000 to the start and end co-ordinates and then annotate that.