Hi,
I am using ChIPpeakAnno package to annotate the peaks with the gene name of their nearest TSS called by MACS2 from my ChIPseq data. The package annotates them perfectly, but I am getting a lots of LRG symbols in addition to Ensembl ID. Although, It's quite easy to convert the LRG symbols to their corresponding HGNC gene symbol from this browser (https://www.lrg-sequence.org/search/?), but I am looking for ways to avoid getting the LRG symbols.
Following is the R code I am using to annotate the chippeak narrowPeak file.
library(ChIPpeakAnno)
library(data.table)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(EnsDb.Hsapiens.v86)
library(dplyr)
gr1 <- toGRanges("HepG2-5FU_TP53_vs_Input_ChIPseq_peaks.narrowPeak", format = "narrowPeak", header = FALSE)
annotation<- toGRanges(EnsDb.Hsapiens.v86, feature = "gene")
annotated_chippeaks <- annotatePeakInBatch(gr1, annotation, featureType = "TSS")
output_file <- "annotated_peaks.tsv"
write.table(annotated_chippeaks, file = output_file, sep = "\t", quote = FALSE, row.names = FALSE)
I would be grateful if someone helps me to omit the LRG symbols from my result. Thank you
Note: Some of the peaks annotated with LRG symbols are also annotated with ensembl IDs. It's easy to remove the duplicated one. But most of them are only annotated with LRG_IDS. For a long list of genes, it takes quite a long time to annotate the LRG symbols manually.
sessionInfo() R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Europe/Kiev tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] devtools_2.4.5 usethis_2.2.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 dplyr_1.1.3 blob_1.2.4
[4] filelock_1.0.2 Biostrings_2.68.1 bitops_1.0-7
[7] fastmap_1.1.1 RCurl_1.98-1.12 BiocFileCache_2.10.1
[10] promises_1.2.1 GenomicAlignments_1.36.0 XML_3.99-0.14
[13] digest_0.6.33 mime_0.12 lifecycle_1.0.4
[16] ellipsis_0.3.2 KEGGREST_1.42.0 RSQLite_2.3.1
[19] magrittr_2.0.3 compiler_4.3.1 rlang_1.1.1
[22] progress_1.2.3 tools_4.3.1 utf8_1.2.3
[25] yaml_2.3.7 rtracklayer_1.60.1 knitr_1.45
[28] htmlwidgets_1.6.4 prettyunits_1.2.0 S4Arrays_1.1.6
[31] pkgbuild_1.4.3 bit_4.0.5 curl_5.1.0
[34] DelayedArray_0.27.10 xml2_1.3.5 pkgload_1.3.3
[37] abind_1.4-5 BiocParallel_1.34.2 miniUI_0.1.1.1
[40] purrr_1.0.2 BiocGenerics_0.48.1 grid_4.3.1
[43] stats4_4.3.1 fansi_1.0.4 urlchecker_1.0.1
[46] profvis_0.3.8 xtable_1.8-4 biomaRt_2.58.0
[49] SummarizedExperiment_1.32.0 cli_3.6.1 crayon_1.5.2
[52] remotes_2.4.2.1 generics_0.1.3 rstudioapi_0.15.0
[55] httr_1.4.7 rjson_0.2.21 sessioninfo_1.2.2
[58] DBI_1.2.1 cachem_1.0.8 stringr_1.5.1
[61] zlibbioc_1.46.0 parallel_4.3.1 AnnotationDbi_1.64.1
[64] BiocManager_1.30.22 XVector_0.40.0 restfulr_0.0.15
[67] matrixStats_1.0.0 vctrs_0.6.3 Matrix_1.6-1.1
[70] IRanges_2.34.1 hms_1.1.3 S4Vectors_0.38.1
[73] bit64_4.0.5 GenomicFeatures_1.54.1 glue_1.6.2
[76] codetools_0.2-19 stringi_1.7.12 later_1.3.2
[79] GenomeInfoDb_1.38.5 BiocIO_1.12.0 GenomicRanges_1.52.0
[82] tibble_3.2.1 pillar_1.9.0 rappdirs_0.3.3
[85] htmltools_0.5.7 GenomeInfoDbData_1.2.11 R6_2.5.1
[88] dbplyr_2.4.0 evaluate_0.23 shiny_1.8.0
[91] Biobase_2.60.0 lattice_0.21-9 highr_0.10
[94] png_0.1-8 Rsamtools_2.16.0 memoise_2.0.1
[97] httpuv_1.6.13 Rcpp_1.0.12 SparseArray_1.1.12
[100] xfun_0.41 MatrixGenerics_1.14.0 fs_1.6.3
[103] pkgconfig_2.0.3