As far as I can tell (by running tests manually), annotatePeaks.pl
in the HOMER package [1] ignores strand information when searching the Transcription Start Site (TSS) nearest to each peak. After a close inspection of the various settings available, I couldn't identify one which changes this behaviour.
This was unexpected for two reasons:
- HOMER actually requires input to have a Strand column [2]
- The data from some NGS technologies are directional (e.g CAGE), so strand information should be used when assigning the peaks to transcript/gene models
Could someone please comment if I'm missing something here?
Thanks
(A couple more things to bare in mind when searching for an annotation tool:)
PeakAnnotator
[1] also ignores strand information (accordingly to its paper [2]) and consider this post [3] for strand-related issues with theChIPpeakAnno
[4] R packages.[1] http://www.ebi.ac.uk/research/bertone/software
[2] https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-415
[3] http://guangchuangyu.github.io/2014/01/bug-of-r-package-chippeakanno/
[4] https://bioconductor.org/packages/release/bioc/html/ChIPpeakAnno.html