Question

ChIPseeker missing full annotation

0

Entering edit mode

7.2 years ago

rbronste ▴ 420

Trying to assign GO and Kegg categories to some ChIP-seq peaks with ChIPseeker and having the following issue:

library("ChIPseeker")

library(TxDb.Mmusculus.UCSC.mm10.knownGene)

library(biomaRt)

library(rtracklayer)

library(org.Mm.eg.db)

peak <- readPeakFile("gains.bed")

peakAnno <- annotatePeak(peak, tssRegion = c(-3000, 3000), TxDb = TxDb.Mmusculus.UCSC.mm10.knownGene, annoDb = "org.Mm.eg.db")

During the annotatePeak step getting this error, not sure what it means exactly:

>> preparing features information...         2017-09-26 10:39:55 
>> identifying nearest features...       2017-09-26 10:39:55 
>> calculating distance from peak to TSS...  2017-09-26 10:39:55 
>> assigning genomic annotation...       2017-09-26 10:39:55 
>> adding gene annotation...             2017-09-26 10:40:08 
'select()' returned 1:many mapping between keys and columns
>> assigning chromosome lengths          2017-09-26 10:40:08 
>> done...                   2017-09-26 10:40:08

ChIPSeeker ChIP-Seq KEGG GO • 3.1k views

ADD COMMENT • link updated 7.2 years ago by tarek.mohamed ▴ 370 • written 7.2 years ago by rbronste ▴ 420

score 1 · Answer 1 · 2017-09-26

Hi, org.Mm.eg.db package uses AnnotationDb package for annotation. AnnotationDb package does annotation via mapIds ( )function which has four main arguments; (1) "keytype" (equivalent to "filter" in Biomart package), (2) "columns" (equivalent to "attributes" in Biomart package), (3) "key" (equivalent to "value" in Biomart package), and "mutival". "multival" arguments specify what should mapIds do when there are multiple values that could be returned? (take a look at ?mapids() ) Options include: first: This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior.

So, I guess that in your case, there are several matches between between your gene_id and gene_symbols and/or gene_names, and by default annotatePeak () returns first value that comes back.

A good thing is to visualize your sorted_bed and sorted_bam files or the summit_file.bed against a reference genome build, for this you can use for example golden helix genome browser. Then you can compare what you are seeing with the annotation result from chIPseeker.

N.B. By default annotatePeak () will annotate the peak summits to the nearest gene.

Tarek