Question

Annotatepeaks Function From Homer

0

Entering edit mode

12.7 years ago

Dataminer ★ 2.8k

Hi!

I am using annotatepeaks.pl function from HOMER to annotate my peaks to the nearest refseq gene. If you have used the tool you will notice it generates 10 columns. I am interested (or the columns which make me interested) are distance to TSS, nearest promoter id, nearest RefSeq ID, and gene name.

Problem: At times the peak is annotated to the nearest promoter id with some distance to TSS, while the column with nearest RefSeq ID and gene name goes blank.

Since I am interested in finding the nearest gene to my peak hence I am interested in nearest Refseq Id and in gene name but if these two columns go blank and distance to nearest TSS does not, then I think I am in bit of trouble (or not)? or is it completely normal that in nearest Promoter Id we can have some id while nearest Refseq Id goes blank and gene name?

I am completely confused.

Thank you please try to answer.

genomics • 13k views

ADD COMMENT • link updated 9.6 years ago by bwshi2012 • 0 • written 12.7 years ago by Dataminer ★ 2.8k

1

Entering edit mode

I haven't seen a case like that, can you put an example file with 5-10 lines. There are cases with empty Nearest Ensembl, Gene Name, Gene Alias but a peak is annotated with a distance to TSS, a Nearest PromoterID is specified along with Nearest Refseq, which are in most of the cases same. Another case, is for that PromoterID there is not annotation for that gene in refseq database. Try looking at it in the UCSC genome browser to get a clue.

ADD REPLY • link 12.7 years ago by Sukhi Singh 11k

0

Entering edit mode

Hi Sukhdeep! Here is an example output file

Chr    St    Stp    TSS    Promoter_ID    Nearest_RefSeq    gene_name
chr1    84743800    84744829    -248 
chr2    120711360   120711771 -14111    NR_000034

In the first example only TSS is present rest of the columns are absent, while in second example TSS and promoter ID are present while rest is absent.

I have used hg18.

ADD REPLY • link 12.7 years ago by Dataminer ★ 2.8k

score 2 · Answer 1 · 2012-08-16

From the manual it says

By default, annotatePeaks.pl loads a file in the "/path-to-homer/data/genomes/<genome>/<genome>.tss" that contains the positions of RefSeq transcription start sites. It uses these positions to determine the closest TSS, reporting the distance (negative values mean upstream of the TSS, positive values mean downstream), and various annotation information linked to locus including alternative identifiers (unigene, entrez gene, ensembl, gene symbol etc.). This information is also used to link gene-specific information (see below) to a peak/region, such as gene expression.

This file.tss has the refseq accession number and position on the chromosome. Most of the cases when a peak intersects at any of these co-ordinates, it cross-intersects the refseq id with the gene alias in another file to give to this 10 column file. So, it might be a case when refseq annotation is present but its not it ucsc/ensembl. I dont have the hg18 with me but try to grep NR_000034 in the genome tss file and look up for the same position in browser to get the answer.

In the first case, it subtracted from somewhere but there is no linked annotation and to have the best answer, contact Chris Benner who made the tool.

Cheers

score 1 · Answer 2 · 2012-08-15

1

Entering edit mode

12.7 years ago

Istvan Albert 102k

As Sukhdeep says and I will add it here just because I think it is the right answer, I think the annotation for genes are lacking and it is possible to have a promoter without a correspondingly annotated gene, or one sufficiently close to be included in the report.

ADD COMMENT • link 12.7 years ago by Istvan Albert 102k

score 0 · Answer 3 · 2015-09-30

0

Entering edit mode

9.6 years ago

bwshi2012 • 0

I met the same problem. I also sent Chris Benner a message, but without any reply.

I use mm9, and the code I put in is

annotatePeaks.pl wt2exp3_peaks.bed data/genomes/mm9 >wt2exp3_peakannotate.xls

and it gives me a result like the following:

Does anyone know how to solve this problem?

ADD COMMENT • link 9.6 years ago by bwshi2012 • 0