I am trying to perform Chip-Seq Analysis which is greatly described in this conversation. http://biology.stackexchange.com/questions/16701/introduction-to-chip-seq. The same as author of mentioned question I am a student of Applied Mathematics and I am doing my best to enter bioinformatics fields.
I am basically at the moment where I would like to annotate peaks with genes' IDs. To do this I am using ChIPpeakAnno R package from Bioconductor. I used below code to annotate my reads (outputs from MACS) with genes' ID's
library(ChIPpeakAnno)
# data(package = "ChIPpeakAnno")$results[,3]
macsOutput <- toGRanges(data="example_peaks.bed",
format = "MACS")
data(TSS.human.GRCh37)
macs.anno <- annotatePeakInBatch(macsOutput,
AnnotationData=TSS.human.GRCh37,
output="overlapping", maxgap=5000L)
library(org.Hs.eg.db)
macs.anno <- addGeneIDs(annotatedPeak=macs.anno,
orgAnn="org.Hs.eg.db",
IDs2Add="symbol")
# no annotations for some genes
as.character(head(as.data.frame(macs.anno)$symbol))
[1] "PTCHD2" "PTCHD2" "PTGER3" NA "HFM1" NA
but there occurs that there are no genes' annotations for some peaks. Can anyone tell me why this might happen? And how to avoid this? Does this refer to the maxgap=5000L
parameter? When creating output from MACS
I set a parameter for length to be 10,000.