Hi!
I am using annotatepeaks.pl function from HOMER to annotate my peaks to the nearest refseq gene. If you have used the tool you will notice it generates 10 columns. I am interested (or the columns which make me interested) are distance to TSS, nearest promoter id, nearest RefSeq ID, and gene name.
Problem: At times the peak is annotated to the nearest promoter id with some distance to TSS, while the column with nearest RefSeq ID and gene name goes blank.
Since I am interested in finding the nearest gene to my peak hence I am interested in nearest Refseq Id and in gene name but if these two columns go blank and distance to nearest TSS does not, then I think I am in bit of trouble (or not)? or is it completely normal that in nearest Promoter Id we can have some id while nearest Refseq Id goes blank and gene name?
I am completely confused.
Thank you please try to answer.
I haven't seen a case like that, can you put an example file with 5-10 lines. There are cases with empty Nearest Ensembl, Gene Name, Gene Alias but a peak is annotated with a distance to TSS, a Nearest PromoterID is specified along with Nearest Refseq, which are in most of the cases same. Another case, is for that PromoterID there is not annotation for that gene in refseq database. Try looking at it in the UCSC genome browser to get a clue.
Hi Sukhdeep! Here is an example output file
In the first example only TSS is present rest of the columns are absent, while in second example TSS and promoter ID are present while rest is absent.
I have used hg18.