chipseq -- annotate peaks with lncRNA
2
0
Entering edit mode
8.1 years ago
apnri ▴ 40

I have peak (bed) files from a chipseq experiment. I would like to annotate these peaks with lncRNA annotations. What would be the best way to go about this?

Thanks!

ChIP-Seq lncRNA peak annotation RNA-Seq • 4.8k views
ADD COMMENT
2
Entering edit mode
8.1 years ago
Guangchuang Yu ★ 2.6k

You can use ChIPseeker.

The only input for annotation is the bed file and TxDb object which can be generated by gtf file downloaded from UCSC.

If you are using human hg19, you can use the following TxDb available in Bioconductor:

https://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts.html.

Read the documents of ChIPseeker, especially the vignette to find out more.


DEMO

just demonstrate the usage. The bed file here is not related to lincRNA.

> require(ChIPseeker)

> getSampleFiles() -> x
> x[[1]]
[1] "/Library/R/library/ChIPseeker/extdata/GEO_sample_data/GSM1174480_ARmo_0M_peaks.bed.gz"
> peak=readPeakFile(x[[1]])
> peak
GRanges object with 812 ranges and 0 metadata columns:
        seqnames                 ranges strand
           <Rle>              <IRanges>  <Rle>
    [1]     chrX [ 61728297,  61728780]      *
    [2]    chr10 [ 39105185,  39105362]      *
    [3]     chrY [ 13137266,  13137499]      *
    [4]    chr11 [114049918, 114050234]      *
    [5]     chrY [ 13107715,  13107867]      *
    ...      ...                    ...    ...
  [808]     chrX [ 49239222,  49239305]      *
  [809]     chrX [ 54945698,  54945789]      *
  [810]     chrX [ 61817143,  61817176]      *
  [811]     chrX [147048421, 147048507]      *
  [812]     chrY [   887860,    887931]      *
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

> require("TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts")
> txdb=TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts
> xx=annotatePeak(peak, TxDb=txdb)
>> preparing features information...         2016-10-06 14:31:03
>> identifying nearest features...       2016-10-06 14:31:03
>> calculating distance from peak to TSS...  2016-10-06 14:31:04
>> assigning genomic annotation...       2016-10-06 14:31:04
>> assigning chromosome lengths          2016-10-06 14:31:06
>> done...                   2016-10-06 14:31:06
> xx
Annotated peaks generated by ChIPseeker
812/812  peaks were annotated
Genomic Annotation Summary:
             Feature  Frequency
6   Promoter (<=1kb)  0.7389163
7   Promoter (1-2kb)  1.2315271
8   Promoter (2-3kb)  0.3694581
4         Other Exon  0.6157635
1         1st Intron  3.2019704
5       Other Intron  3.0788177
3 Downstream (<=3kb)  0.4926108
2  Distal Intergenic 90.2709360
> as.GRanges(xx)
GRanges object with 812 ranges and 9 metadata columns:
        seqnames                 ranges strand |        annotation   geneChr
           <Rle>              <IRanges>  <Rle> |       <character> <integer>
    [1]     chrX [ 61728297,  61728780]      * | Distal Intergenic        23
    [2]    chr10 [ 39105185,  39105362]      * | Distal Intergenic        10
    [3]     chrY [ 13137266,  13137499]      * | Distal Intergenic        24
    [4]    chr11 [114049918, 114050234]      * | Distal Intergenic        11
    [5]     chrY [ 13107715,  13107867]      * | Distal Intergenic        24
        geneStart   geneEnd geneLength geneStrand            geneId
        <integer> <integer>  <integer>  <integer>       <character>
    [1]  61998718  61999787       1070          1 TCONS_l2_00030232
    [2]  38933913  38982200      48288          2 TCONS_l2_00004140
    [3]  13362085  13370619       8535          2 TCONS_l2_00030933
    [4] 113887644 113888813       1170          2    TCONS_00019764
    [5]  13362085  13370619       8535          2 TCONS_l2_00030933
             transcriptId distanceToTSS
              <character>     <numeric>
    [1] TCONS_l2_00030232       -269938
    [2] TCONS_l2_00004140       -122985
    [3] TCONS_l2_00030933        233120
    [4]    TCONS_00019764       -161105
    [5] TCONS_l2_00030933        262752
  -------
  seqinfo: 24 sequences from hg19 genome
>
ADD COMMENT
0
Entering edit mode

Dear Dr. Yu, Thank you very much to provide an example here. I wonder if there is a way to use ChIPseeker to summarize features including tRNAs and CDS (protein-coding sequences) from a bed file of ChIP clusters. The genomic annotation summary is similar to the example you provided here, but with features including tRNAs and CDS included, for example below:

Genomic Annotation Summary:

Feature Frequency

1 CDS xxx

2 tRNA xxx

3 intron xxx

4 intergenic xxx

Thanks,

Xiao Lei

ADD REPLY
0
Entering edit mode
8.0 years ago
dcwest • 0

Hi there, my similar question is about how these regions are defined... Promoter regions make sense Downstream (<=3kb) Does <=3kb mean the entire three promoter subset regions? Distal Intergenic -- what defines this region? Is it >=3kb, or something even more distal, or just anything intergenic? Sorry if my question is too simple for understanding these region definitions... ChIPSeeker is awesome by the way!! Diana

ADD COMMENT

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6