I have peak (bed) files from a chipseq experiment. I would like to annotate these peaks with lncRNA annotations. What would be the best way to go about this?
Thanks!
I have peak (bed) files from a chipseq experiment. I would like to annotate these peaks with lncRNA annotations. What would be the best way to go about this?
Thanks!
You can use ChIPseeker.
The only input for annotation is the bed
file and TxDb
object which can be generated by gtf
file downloaded from UCSC
.
If you are using human hg19, you can use the following TxDb
available in Bioconductor
:
Read the documents of ChIPseeker
, especially the vignette to find out more.
bed
file here is not related to lincRNA
.> require(ChIPseeker)
> getSampleFiles() -> x
> x[[1]]
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1174480_ARmo_0M_peaks.bed.gz"
> peak=readPeakFile(x[[1]])
> peak
GRanges object with 812 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrX [ 61728297, 61728780] *
[2] chr10 [ 39105185, 39105362] *
[3] chrY [ 13137266, 13137499] *
[4] chr11 [114049918, 114050234] *
[5] chrY [ 13107715, 13107867] *
... ... ... ...
[808] chrX [ 49239222, 49239305] *
[809] chrX [ 54945698, 54945789] *
[810] chrX [ 61817143, 61817176] *
[811] chrX [147048421, 147048507] *
[812] chrY [ 887860, 887931] *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
> require("TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts")
> txdb=TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts
> xx=annotatePeak(peak, TxDb=txdb)
>> preparing features information... 2016-10-06 14:31:03
>> identifying nearest features... 2016-10-06 14:31:03
>> calculating distance from peak to TSS... 2016-10-06 14:31:04
>> assigning genomic annotation... 2016-10-06 14:31:04
>> assigning chromosome lengths 2016-10-06 14:31:06
>> done... 2016-10-06 14:31:06
> xx
Annotated peaks generated by ChIPseeker
812/812 peaks were annotated
Genomic Annotation Summary:
Feature Frequency
6 Promoter (<=1kb) 0.7389163
7 Promoter (1-2kb) 1.2315271
8 Promoter (2-3kb) 0.3694581
4 Other Exon 0.6157635
1 1st Intron 3.2019704
5 Other Intron 3.0788177
3 Downstream (<=3kb) 0.4926108
2 Distal Intergenic 90.2709360
> as.GRanges(xx)
GRanges object with 812 ranges and 9 metadata columns:
seqnames ranges strand | annotation geneChr
<Rle> <IRanges> <Rle> | <character> <integer>
[1] chrX [ 61728297, 61728780] * | Distal Intergenic 23
[2] chr10 [ 39105185, 39105362] * | Distal Intergenic 10
[3] chrY [ 13137266, 13137499] * | Distal Intergenic 24
[4] chr11 [114049918, 114050234] * | Distal Intergenic 11
[5] chrY [ 13107715, 13107867] * | Distal Intergenic 24
geneStart geneEnd geneLength geneStrand geneId
<integer> <integer> <integer> <integer> <character>
[1] 61998718 61999787 1070 1 TCONS_l2_00030232
[2] 38933913 38982200 48288 2 TCONS_l2_00004140
[3] 13362085 13370619 8535 2 TCONS_l2_00030933
[4] 113887644 113888813 1170 2 TCONS_00019764
[5] 13362085 13370619 8535 2 TCONS_l2_00030933
transcriptId distanceToTSS
<character> <numeric>
[1] TCONS_l2_00030232 -269938
[2] TCONS_l2_00004140 -122985
[3] TCONS_l2_00030933 233120
[4] TCONS_00019764 -161105
[5] TCONS_l2_00030933 262752
-------
seqinfo: 24 sequences from hg19 genome
>
Hi there, my similar question is about how these regions are defined... Promoter regions make sense Downstream (<=3kb) Does <=3kb mean the entire three promoter subset regions? Distal Intergenic -- what defines this region? Is it >=3kb, or something even more distal, or just anything intergenic? Sorry if my question is too simple for understanding these region definitions... ChIPSeeker is awesome by the way!! Diana
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Dear Dr. Yu, Thank you very much to provide an example here. I wonder if there is a way to use ChIPseeker to summarize features including tRNAs and CDS (protein-coding sequences) from a bed file of ChIP clusters. The genomic annotation summary is similar to the example you provided here, but with features including tRNAs and CDS included, for example below:
Genomic Annotation Summary:
Feature Frequency
1 CDS xxx
2 tRNA xxx
3 intron xxx
4 intergenic xxx
Thanks,
Xiao Lei